Transforming IT with Palantir Foundry and AIP
From Misconceptions to Rapid Value Delivery
Building a Data Lakehouse has become a major trend in 2024. Organizations are looking to capture and manage large amounts of data and operationalize it in near real-time with their AI/ML workloads. IT leaders considering Data Lakehouse platforms have many questions in common:
- Do I build a platform or go with a commercial offering like Databricks?
- How do I support sub-second latency for near real-time workloads?
- How do I migrate from my existing platform to this new platform?
- Can I save money relative to my current spend?
- How does this create business value?
Commercial platforms like Snowflake and Databricks are staples in architecture discussions, but one platform that is not being talked about enough is Palantir Foundry. With Foundry, businesses can extract value in weeks or months, not years. Foundry accomplishes this by delivering the complete data and application stack in a single integrated platform. This includes solutions for MLOps and commercial and open-source GenAI models. However, several common misconceptions about Palantir and its products must be addressed before an organization can move forward with a pilot and eventual adoption.
It’s a Black Box
Foundry and its Artificial Intelligence Platform (AIP) components are built on open-source technology you already know and love. Below is a breakdown of the technology embedded in Foundry and those it integrates with:
Foundry’s core data processing layer is built on top of a general-purpose autoscaling Kubernetes application. The default supports Polars, Spark, and Flink, but you can also bring your own container for processing. The application layer is built on TypeScript, Python, React, and Lucene/ElasticSearch. It also includes VSCode Workspaces for all your code repositories and third-party applications (in preview). These are the most widely adopted open-source tools in the market today. Software engineers skilled in these tools can sit down on day zero in Foundry and build.
Foundry also integrates with other parts of your stack. You can bring your own models and databases to Foundry. You can also integrate your enterprise version control systems (mirrored) and your service mesh into data pipelines and applications. In addition, data assets and models produced in external systems can be integrated and used in Foundry as first-class citizens.
In addition to the above, Palantir offers free public developer stacks so you can evaluate and learn the technology with zero commitment to buy. Visit learn.palantir.com and signup.palantirfoundry.com for more information.
This is partially true for internals like deployment, debugging, and log analysis. You can not control how your code is deployed. For example, the process of submitting spark jobs or deploying serverless APIs is managed on your behalf. You can not affect the WebPack config of your serverless functions, and you can not control the internals of the Spark job queue. You can, of course, configure options related to these deployments, but you can not directly control them.
You also do not have direct access to application log files for obvious security reasons. Foundry stacks are deployed under a pool model. Logs are surfaced on your behalf and can be observed and downloaded through Foundry. This can be problematic, though, as there is no way to trace errors through every stack layer in a centralized logging solution like an ELK stack. There’s also no support for distributed tracing.
Foundry is also a closed-source platform, which is no different from most commercial software offerings.
It’s Expensive
Based purely on compute and storage pricing, Foundry is more expensive than hyperscaler services and some commercial offerings. Foundry is not more expensive if you apply Activity Based Costing and other best practices to calculate total cost of ownership (TCO). When you factor in the costs of engineering, support, and additional services that can be retired, Foundry is often much cheaper than the alternatives.
Palantir offers usage-based pricing based on three dimensions:
- Compute — “Compute-seconds represent a unit of computational work in the platform and are used by both batch (long-running) and interactive (ad-hoc) applications.”
- Storage — “Foundry storage measures the general purpose data stored in the non-Ontology transformation layers in Foundry. Disk usage is measured in
gigabyte-months
.” - Ontology — “Foundry’s Ontology and indexed data formats provide tools for fast, organization-centric queries and actions. These backing systems store the data in formats that are significantly more flexible for ad-hoc operational and analytical use cases. Ontology volume is measured in
gigabyte-months
.”
For a detailed explanation of usage types, visit this page from Palantir.
I was able to import, process, clean, and optimize ~20 TB (~5 billion rows) of data for less than $10k, which is very reasonable compared to comparable systems. The incremental monthly costs for maintaining this data set were ~$2k. Many organization's entire data corpus used in operational decision-making is much smaller than this. Virtual tables for raw data storage can also offer significant savings for a large corpus.
AI costs are also very reasonable, and you can choose between the most comprehensive range of open-source and commercial models for your workloads. My spending is about $500 per month on a semantic search application processing SEC fillings.
Palantir also offers enterprise pricing plans with significant savings over pure usage-based pricing.
Enterprises should not be penny-wise and pound-foolish. It’s important to understand the TCO when comparing the cost of other commercial platforms or constructing a platform on hyperscalers.
It’s Vendor Locked
Vendor lock is a generally misunderstood topic. Before engaging in these arguments, I recommend reading “Don’t get locked up into avoiding lock-in” by Gregor Hohpe to better understand the problem. Here are some key takeaways from the article:
- Lock-in is not a binary issue; avoiding one type of lock-in can lead to another.
- Architects often see lock-in as their enemy, but experienced ones recognize that it has many facets and sometimes can be the favored solution.
- Open-source solutions do not automatically eliminate lock-in. They can reduce vendor lock-in but often introduce product lock-in.
- Managed open-source services may still tie users to specific versions or proprietary extensions.
- Avoiding lock-in often involves additional effort, expenses, underutilization of vendor-specific features, and increased system complexity.
- Architects should balance the upfront investment in reducing lock-in with the potential liability of being locked in.
- Accept some degree of lock-in when it provides significant utility and the likelihood of needing to switch is low.
- Use low-effort mechanisms to reduce lock-in, but avoid over-investing to minimize switching costs.
- Be cautious with multi-cloud strategies that aim to eliminate cloud provider lock-in, as they can negate cloud benefits.
With that in mind, let’s dispel this myth about Palantir Foundry.
- Foundry is cloud agnostic. You can choose which cloud(s) to deploy to.
- Foundry interoperates with existing systems.
- There is a certification program, solution partners, and a talent pool. You do not need Palantir to build or consult with you on solutions.
- All data transformations written in Foundry can be migrated out of Foundry by checking out the repository and removing the transform API. For well-architected transforms, this is a simple decorator that can be swapped out in a few lines of code.
- Visual transforms built-in Pipeline Builder can be promoted to a code repository and migrated out of Foundry, similar to the above.
- Code workbooks can be promoted to a code repository and migrated out of Foundry like the above.
- Jupyter notebooks can be checked out and repurposed in any stack that supports them.
- TypeScript functions can be checked out and ported over to another stack. Well-architected functions will abstract the usage of Ontology APIs in a DAO or similar pattern, making migration relatively painless.
- Foundry hosted web applications can be deployed on similar platforms like AWS or Vercel after removing the OSDK.
There is a migration path out of Foundry that is easily accessible to those customers who no longer want to continue to use the platform. Further, Foundry requires substantially less upfront cost/time to extract business value. Lastly, Foundry avoids the least talked about problem for those trying to prevent vendor lock: they are locked into a poor solution they created. Poorly maintained and documented systems that require tribal knowledge to sustain are much more risky endeavors than paying a commercial software company to provide a product or service.
Conclusion
Palantir Foundry is a compelling commercial platform for rebuilding your data infrastructure and deploying AI-enabled applications that deliver business value. However, many misconceptions about the platform prevent companies from considering it for their data infrastructure. Take the time to learn the platform by test-driving it in a developer stack and discover Palantir partners using Foundry to deliver rapid value creation for their customers.
Thanks for reading. Be sure to follow me on Medium for more articles on Foundry, AI, and software engineering.