Harmonizing AI, Humans, and Software Systems

Dorian Smiley
6 min readJan 8, 2024

--

Building a Saga with Palantir’s AIP Logic and X-State

An AI Orchestrated Software System

In my previous article, I made the case as to why agents and tools will outperform chat in terms of value creation. Since then, I’ve been developing numerous use cases focused on improving operations using this approach (intelligent operations). One use case that proved extremely difficult was medical claims processing.

The Problem

Processing a claim is a long-running process that spans multiple systems. Human intervention may be required depending on the quality of the claim data, which can span several days. This type of problem is often modeled as a Saga.

The various systems and parties can be modeled as services coordinated through an orchestrator.

Source: https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/saga/saga

As an analogy, let’s consider some of the steps in claims processing:

  1. Attempt to match the attending physician with NPPES data
  2. If no match is found, contact HR and use their external systems and data to find the matching physician details
  3. Call back into the claims process, passing the physician details, or mark the claim as being unable to process

Think of the Saga as a conductor (a central orchestrator) directing the performance of an orchestra. In the medical claims analogy, our AI orchestrator guides each step precisely like a seasoned conductor. The attending physician matching service, HR systems, and claims processing module act as diverse instruments in the orchestra, each playing a specific role in the coordinated symphony. The Saga, acting as sheet music, details the sequence of actions. If a step fails to match a physician, the orchestrator adapts smoothly, signaling HR contact. This interaction seamlessly integrates into the performance, ensuring harmonious collaboration among systems and parties. Like a conductor extracting the best from each musician, the orchestrator enables distributed systems to work cohesively in the intricate ballet of medical claims processing.

Why are Sagas important? Because almost all workflow automation today involves the coordination of distributed systems. Examples include:

  1. E-commerce and Order Fulfillment
  2. Reservations
  3. Ticketing
  4. Baking and Financial Systems
  5. Healthcare Information Systems
  6. Supply Chain Management

Implementing a Saga as part of an AI agent workflow can be extremely challenging because most agent frameworks assume an execution is relatively short-lived (seconds or minutes, not days), uninterrupted, and fully encapsulated. Luckily, Palantir’s AIP Logic combined with the Ontology Actions and webhooks gives us all the tools we need to enable Sagas in our agent workflows. As a result, I was able to implement this solution in a single sprint!

Solution Architecture

  1. Agent Framework: AIP Logic defined my agent using its visual interface for engineering prompts, assembling logic blocks, exposing tools, model selection, testing, and tracing.
  2. Orchestrator: I used X-State to provide the orchestration layer for our Saga. X-State is a state machine language that can support complex logic, is fully serializable, and can pause and resume execution.
  3. Events: Palantir’s AIP Automate enables an event-driven architecture, which executes our agent in response to data mutations.
  4. Callbacks: External systems include the claims processing software used by our clients. Integration was managed via webhook callbacks executed by a human (clicking a button) after resolving issues in the external system.
  5. Notifications: Humans require notifications to remediate issues in the claim data that machines can’t resolve. Notifications were implemented using Foundry’s built-in notification system, which supports email and chat applications like Slack and Teams.
Screenshot of AIP Logic Blocks

Implementation

Our agent is executed in response to claim creation events. The agent workflow is as follows:

  1. The selected LLM reasons about how to process the claim, selecting the supplied process claim tool (which you can see in the screenshot above).
  2. AIP Logic executes the process claim tool. This includes an additional LLM call for similarity scoring to determine the attending physician. The HR team is contacted if the model can’t determine the correctness of the data contained in the claim.
  3. If the HR team needs to resolve a problem, a notification is sent to the HR team, execution is paused, and the machine execution is persisted. Once the issue is resolved (there could be a delay of days), the HR team clicks the callback link in the notification to resume agent execution.
  4. The agent finishes executing.

We can visualize the process using X-State Visualizer:

This claim processing tool implements our orchestrator as a state machine in x-state. Below is a screenshot of the machine.

Outline of our state machine in X-State

It’s important to note that the contact HR state serializes the state machine execution, saves it to Foundry’s Ontology, and then sends a notification to the HR team with embedded callback URLs. Our notification is sent using Foundry’s built-in notification system. This is configured on our Notifications Ontology object, which uses the following function to assemble the email notification.

The email includes buttons to resolve or reject the physician’s details. The HR team uses external systems to correct data quality issues and manually resolve the provided information. When the callback URL is triggered, a webhook notification resumes execution of the state machine. Below is the code that manages the hydration process.

If our function receives an execution ID, it will rehydrate the machine from the persisted Machine Execution record retrieved from the Ontology. It’s important to note that this ID is only supplied in webhook invocations to resume execution. Our agent doesn’t attempt to provide it. This should underscore the versatility of tools. Tools are just functions that can be reused across your traditional applications and AI agents. The entire service mesh or your organization can be exposed as tools agents can use to solve complex problems.

Development

I implemented this solution in a single sprint, including data integration and transformation of the raw claim data. This is a testament to the power of Palantir’s Foundry platform. If I had to build this solution from scratch, I would have needed a team of engineers and months to create the required infrastructure and solution components. I’m now generalizing the solution so it can be reused in all our agents. This generalization will also introduce a mutex to ensure we don’t process multiple callbacks simultaneously.

Conclusion

Agents used to power intelligent operations represent a step change in technology. However, many workflows require Sagas to enable human-in-the-loop workflows and systems integration. Leveraging Foundry’s powerful AIP platform with X-State’s incredibly expressive state machine language, we can support a more comprehensive array of use cases that deliver massive value to our customers.

--

--

Dorian Smiley

I’m an early to mid stage start up warrior with a passion for scaling great ideas. The great loves of my life are my wife, my daughter, and surfing!