Matching the Pace of AI Innovation
How to Prepare for Uncertainty and Radical Change
Change can be difficult. Whether adapting to a new software solution or a company reorganization, uprooting well-understood systems is always challenging. Recent advances in AI have introduced a rate of change that virtually every organization needs to prepare for. New technology comes and goes in months, not years. Every week is an opportunity to spin the flywheel of innovation and "take the whole market." This means an AI-forward organization is prepared to re-platform applications with new models in hours, take advantage of new infrastructure in days, and reskill/upskill in weeks. This is a high bar and one that will get higher.
In conjunction with this rapid rate of change is an ongoing assessment of whether today's leading GenAI models will continue to improve. The answer to that question will affect hundreds of billions in capital investments, the trajectory of AI adoption, and countless founders' and investors' hopes and dreams. Learning how to cope with this high level of uncertainty and change is the challenge of our time.
Market Outlook
All indicators point to an ever-accelerating pace of innovation. Here are just a few of the key findings from Stanford's 2024 AI Trends:
In 2023, the release of foundation models saw a remarkable surge, with 149 new models introduced, more than double the number released in 2022. Of these newly launched models, 65.7% were open-source, marking a significant increase from the 44.4% in 2022 and 33.3% in 2021.
AI Index estimates highlight the escalating training costs for state-of-the-art AI models. For instance, training OpenAI’s GPT-4 incurred an estimated cost of $78 million in compute, while Google’s Gemini Ultra required an even more staggering $191 million.
The period from 2021 to 2022 witnessed a sharp 62.7% increase in global AI patent grants. Since 2010, the number of granted AI patents has soared more than 31 times, reflecting the rapid pace of innovation in the field.
Since 2011, AI-related projects on GitHub have consistently grown, skyrocketing from 845 in 2011 to approximately 1.8 million in 2023. Notably, there was a dramatic 59.3% increase in the number of AI projects on GitHub in 2023 alone. The total number of stars for AI-related projects also saw a significant jump, tripling from 4.0 million in 2022 to 12.2 million in 2023.
Today's leading models continue to improve, surpassing human performance on several benchmarks.
AI Agents and frameworks continue to advance to the point where they can perform many enterprise tasks.
Costs also continue to decline, while the performance of leading LLMs is expected to increase dramatically over the next year. New chipset architectures, like Groq and Etched AI, could deliver a 10–12x performance increase while decreasing costs.
Sources:
- https://www.datamonsters.com/news/groqs-ai-breakthrough-unrivaled-performance
- https://www.wsj.com/articles/startup-etched-closes-seed-round-promises-more-cost-effective-ai-chip-f5fd79aa
- https://medium.aifastcash.com/unveiling-etched-the-transformational-supercomputer-redefining-performance-f1ba94cb0369
- https://artificialanalysis.ai/models/
Statista projects that the AI market size will expand from $241.8 billion in 2023 to nearly $740 billion by 2030, representing a compound annual growth rate (CAGR) of 17.3%. A growth rate that could accelerate if model performance improves while costs trend down.
Healthy Skepticism
In a recent interview with Computerfile, Dr Mike Pound examines whether generative AI has already peaked. Dr. Pound questions whether or not current model architectures can continue to scale at the pace some are predicting. In the interview, Dr. Pound illustrates three possible scaling curves: exciting (exponential), balanced (linear), and evidence-based (logarithmic).
Today, it is still uncertain which potential curve we are on. This begs the question of whether we've already reached peak performance of the transformer architecture and what might be next. The release of GPT-5 may answer those questions and either accelerate current trends or send massive shockwaves through the industry.
In addition to questions about the transformer architecture's scalability, strengths and weaknesses are observed. Strengths include synthesizing plausible responses to some input text, novel imagery, and video. This should not come as a shock. After all, these are generative models. In addition, researchers claim there are emergent behaviors at scale, such as the ability to solve math problems.
But these claims are buckling under scrutiny. Researchers at Stanford claim emergent behaviors are a mirage:
“With bigger models, you get better performance, but we don’t have evidence to suggest that the whole is greater than the sum of its parts.” — Rylan Schaeffer
Instead, the researchers observe mostly linear scaling across 29 metrics. Only four metrics show any emergent behavior, but they degrade quickly:
“They’re all sharp, deforming, non-continuous metrics,” explains Schaeffer
In addition, the transformer architecture includes the following features:
- Hallucinations: Generative models can hallucinate responses. In other words, they can "make stuff up." This is a feature of the architecture, not a bug.
- Timeliness: Generative Pretrained Transformers (GPT) are pretrained. This means their ability to synthesize an accurate response depends on the timeliness of their training data. Software engineers attempt to solve this issue by grounding the models using information retrieval systems that can provide timely and accurate in-context data. They may also fine-tune models on additional datasets.
- Blandness: In many cases, GenAI can deliver a fortune cookie response, i.e., a response that could be more varied and precise. These responses are not useful when you need an exact answer. This limits the usefulness of models outside a very narrow band of use cases.
Engineering Challenges
Engineers have relied primarily on Retrieval-Augmented Generation (RAG) to cope with today's limitations. RAG involves using the context window to train the model at inference time using various information retrieval systems. This helps drive down the rate of hallucinations and improve the model's reasoning capabilities.
However, RAG systems often rely on embedding databases, which don't provide fine-grained resolution for information retrieval. Engineers have often implemented knowledge graphs (KG), knowledge graph embeddings, and traditional search indexes to improve search resolution. Knowledge graphs can also improve the reasoning capabilities of models, enabling them to use standard graph search algorithms to find related contextually relevant content.
In addition, model ensembles with blended architectures are used to help cope with cost and performance limitations. For example, engineers may include traditional NLP models to perform named entity extraction and propensity models to provide better predictive signals for agentic workflows.
Many of these engineering projects are incredibly advanced, requiring specialized talent in short supply. In addition, managing complex data engineering projects such as knowledge graphs and semantic and traditional search across structured, semi-structured, and structured data requires advanced platforms to manage complexity. For example, there are two primary reasons large-scale data engineering projects fail:
- Change management: Predicting the impact of changes made to data integration and transformations is a crucial aspect of data platforms. Without the ability to approve, track, and test the effects of changes, you are almost certain to fail as the number of contributors grows. Managing data and code changes under a single change management system is required.
- Data quality: Understanding the levels of data quality flowing through your pipelines is critical for preventing model degradation. As the old saying goes, "garbage in, garbage out."
It's shockingly rare for teams to prioritize these aspects of large-scale data engineering projects, which partially explains why so many fail.
Cost
Despite ramping up the production of advanced chipsets like NVIDIA's H100s, the cost of training and running today's most advanced models remains high. Both training and inference workloads utilize these GPUs heavily. Training large language models like GPT-4 requires substantial GPU resources. For instance, training GPT-3 used around 10,000 Nvidia GPUs, and GPT -4's demands are even higher.
Cloud providers and AI companies implement various rationing strategies to manage the limited supply. These include token and rate limits to ensure fair access to computational resources. For example, OpenAI's published rate limits restrict model consumption through product tiers (zero to five). Until performance and cost come down to the forecasted levels of Groq or EtchedAI, software engineers will have to design creative solutions to cope with limited supply. This will significantly restrict the number of AI applications, especially in the consumer context where ad-supported free usage tiers dominate.
Organizational Readiness
In uncertain times, when the market could turn on a dime, agility is a muscle you must develop. Despite AI's numerous challenges, many organizations are already extracting value. This is mainly due to partnerships with companies like Microsoft, Palantir, and OpenAI. In addition to strong technology partners, organizations should prioritize exceptional leadership, agile technologies, and a fail-fast mentality.
Exceptional Leadership
The modern leader of an AI-forward organization must have a solid technical and business acumen. They must understand the company at a fundamental level and simultaneously build up the scar tissue formed by leading technology initiatives from the front lines. These leaders will intuitively understand that tying technology initiatives to measurable business outcomes within a predefined timeframe are the metrics that matter.
Agile Technology
Adopting agile technology platforms will enable companies to extract the most value possible from current technology and pivot to what comes next. This will help the organization build a moat of innovation, one incremental technological improvement at a time. Agile platforms have the following characteristics:
- Zero switching cost — An agile Platform automates switching between models using evals (evaluations) and automates the process of fine-tuning. A non-agile platform maximizes the cost of switching between models and makes undifferentiated heavy lifting like model fine-tuning and engineering exercise.
- Portable—An agile platform makes re-platforming the infrastructure layer easy (moving from hyper scaler A to hyper scaler B, moving to on-prem, etc). The biggest value lever you can pull is taking advantage of the latest infrastructure to drive cost down and performance up (fast).
- Accessible — You need all your people to meet this moment, not just the ones wearing propeller hats. An agile technology platform empowers everyone in your organization to participate in the AI revolution. Builders include your SMEs, your front-line operators, and your senior leaders.
- Adaptive—By definition, an agile platform adapts to the needs of the business and broader market. It does not force the business to adapt to the technology. The ability to empower citizen engineers who can mold the software to their workflows is a critical test for adaptive systems.
- Scalable- An agile platform manages complexity at scale on your behalf. It comes with strong opinions about managing change and observability.
Fail Fast
How do you know what technologies to invest in? What areas do you invest in upskilling? What investments do you kill? The answer to the question is simple:
“We invest in technologies that work in production!”
Production is where value is extracted, not in meeting rooms, conference calls, or power points. A strategy deck won't prove a technology works for your business, and it will ensure you fall behind the pace of innovation. The window of opportunity to take advantage of today's leading technology is shorter than the decision-making process of planning committees. You must learn how to iterate in production to extract value. This requires you to work backward from problems and adopt a fail-fast mentality using a platform designed for evolution under production stress.
Conclusions
Fear, uncertainty, and doubt will reign in the coming months. Questions regarding AI scaling may be answered. The rate of technological change will increase. Organizations will continue to experience pressure to adapt, especially as their peers successfully extract business value from technology investments. The window of opportunity to create a lead will start to shut. Stagnation is not sustainable in such an environment and those who fail to innovate risk falling irreversibly behind their more agile competitors.