Will CoPilots Be the Killer AI App?

9 min readNov 21, 2023

Unpacking CoPilots and Agents

Understanding Microsoft's CoPilot Approach

Microsoft has been focusing on enabling AI through its CoPilot technology, which is designed to enhance productivity within its existing products. This allows Microsoft to take advantage of its current user base and distribution channels to capitalize on the use of AI in the enterprise. However, questions still need to be answered about whether this is the best use of AI or whether it will provide any differentiated capabilities for their customers.

CoPilot Integration and Capabilities

CoPilot is integrated into Microsoft 365, working alongside traditional applications like Word, Excel, PowerPoint, Outlook, and Teams. This integration provides a seamless user experience for those looking to use generative AI for increased productivity when working with these tools.

Word: CoPilot in Word assists in writing, editing, summarizing, and creating content. It can generate drafts, add content, and even suggest tones for different contexts.
Excel: In Excel, CoPilot aids in data analysis and exploration, responding to natural language queries to reveal trends and correlations.
PowerPoint: CoPilot transforms ideas into presentations, offering capabilities like creating presentations and styling and reformatting layouts.
Outlook: It streamlines email management by summarizing conversations and assisting in drafting responses.
Teams: CoPilot enhances meeting effectiveness, organizes discussion points, and summarizes key actions.
Business Chat: A new feature, Business Chat, utilizes Microsoft Graph to integrate data from various sources like emails, calendars, and documents to streamline workflow and decision-making.
Viva Engage Integration: In Viva Engage, CoPilot offers conversation starters and suggests responses to keep workplace conversations engaging and productive.

Enterprise Application and Development

CoPilot has been extended for use in various organizations, including prominent enterprises like KPMG. These customizations allow organizations to derive insights specific to their operational needs.

Plugin and Connector Development: Microsoft has enabled plugin development for third-party apps, enhancing capabilities like advanced search and intelligent recommendations. Notable examples include Jira Cloud and Mural.
Impact on Productivity: Data from GitHub on the use of CoPilot among developers shows significant productivity increases, with 88% of users reporting enhanced productivity.

Pricing

The pricing and rate limiting of AI models, particularly in the context of Microsoft CoPilot and similar technologies, involve several factors. Here's an analysis based on different sources:

CoPilot: Microsoft 365 CoPilot is priced at $30 per user per month for Microsoft 365 E3, E5, Business Standard, and Business Premium customers.
GPT-4: The pricing for GPT-4 ranges from $0.03 to $0.12 per 1,000 tokens, depending on the specific model used.
Azure OpenAI Quotas: There are specific quotas for OpenAI resources per Azure subscription, including limits on concurrent requests for DALL-E models and maximum prompt tokens per request
Regional Quota Limits: These quotas vary by model and region. For example, the token limit for GPT-3.5 Turbo ranges from 120K to 300K tokens per minute, depending on the region. GPT-4 ranges from 20K to 80K tokens per minute.
Rate Limit Management: Azure recommends implementing retry logic, gradual workload increase, and testing different load patterns to manage rate limits effectively.

Analysis and Implications

The pricing structure could represent a significant investment, especially for large enterprises. The add-on cost of CoPilot to existing Microsoft 365 subscriptions could affect budgeting decisions.

The rate limits, especially for sophisticated models like GPT-4, could pose challenges in scenarios requiring high-volume or rapid AI interactions. Enterprises might need to strategize deploying AI technologies, focusing on critical areas where AI intervention brings the most value, especially for large enterprises during peak usage times. This might necessitate planning and additional investment in higher quotas or more efficient usage strategies.

Future Outlook

Costs are starting to come down and fast. At the OpenAI Dev Day in 2023, significant cost reductions were announced. Key highlights include:

Cheaper Tokens: This enhanced version of GPT-4 is both more capable and cost-efficient. It boasts a 128K context window, processing over 300 pages of text in a single prompt. Notably, GPT-4 Turbo input tokens are 3x cheaper at $0.01, and output tokens are 2x cheaper at $0.03 than the previous GPT-4 model. GPT-3.5 Turbo input tokens are 3x cheaper than the previous 16K model at $0.001, and output tokens are 2x cheaper at $0.002.
Higher Rate Limits: OpenAI has doubled the tokens per minute limit for paying GPT-4 customers to support application scalability further. This change and the ability to request usage limits increase developers' ability to scale their applications more effectively.

Companies can also consider developing AI models and plugins tailored to specific needs and constraints. However, this can have significant upfront costs, and companies will need access to the appropriate talent capable of building such models. A notable example is BloombergGPT:

This 50 billion parameter language model is specifically designed for the financial industry. BloombergGPT's development utilized a mixed approach, combining general-purpose capabilities with domain-specific proficiency. By leveraging Bloomberg's extensive archives of financial data and public datasets, the model was trained on a corpus of over 700 billion tokens. This approach resulted in a model that excels in financial tasks while maintaining competitive performance in general NLP benchmarks.

Companies can also take advantage of open-source models using Huggin Face. Hugging Face offers a variety of inference solutions for serving predictions from over 500,000 models hosted on its platform, including free and rate-limited Inference API, dedicated infrastructure deployments with Inference Endpoints, and even in-browser edge inference with Transformers.js. Their partnerships with AWS and Cloudflare allow companies to achieve performance and scalability while optimizing costs.

Commercial models can also be combined with open-source and private foundation models. One common approach is to use GPT-4 as a controller that can orchestrate the use of additional models to achieve a set of tasks. Companies can effectively implement this strategy by leveraging the available CoPilot plugin architecture.

One last emerging trend will push LLMs to the edge by running on the user's device. In collaboration with Meta, Qualcomm Technologies has integrated Llama 2 LLMs directly onto devices, aiming to reduce dependence on cloud services. This allows for implementing LLMs in AI applications running on various edge devices, such as virtual assistants, productivity tools, and entertainment apps. Qualcomm's Snapdragon platform supports this implementation, offering a high-processing solution that enables efficient AI operations even in areas without internet connectivity.

Intel has focused on optimizing LLMs to run efficiently on their CPUs. Large LLMs typically require significant computing power, often found in high-end GPUs, making their use costly for many organizations. Intel has employed a technique known as quantization, a model-compression method that reduces the range of unique values model parameters can take, thus shrinking the model size and speeding up operations. This approach aims to maintain model accuracy while reducing computational demands.

Shifting Paradigms: From Human-Centric to AI-Centric Software

A significant shift in AI integration in software is occurring from human-centric to AI-centric applications. This section explores how frameworks and platforms like Palantir's AIP, AutoGen, LangChain, Semantic Kernel, and Prompt Flow facilitate this transition.

AutoGen

AutoGen is a framework for orchestrating, optimizing, and automating workflows using large language models (LLMs). It significantly simplifies the development process of complex LLM-based applications.

AutoGen enables creating systems with multiple agents, each with specialized roles and capabilities. This design reduces manual interactions and coding efforts, enhancing efficiency in tasks such as supply-chain optimization. These agents, leveraging LLMs, human inputs, and tools, can handle various tasks, from automated task-solving to code execution. This is particularly useful in use cases involving adaptive problem-solving.

LangChain

LangChain is a comprehensive framework for developing applications powered by language models. It focuses on creating context-aware and reasoning applications, providing a suite of tools for interface and integration. LangChain simplifies the entire application lifecycle, from development using templates and libraries to production and deployment.

Its main value propositions include composable components for language model integration and built-in chains for high-level tasks, offering ease of use and customization. LangChain offers interfaces for model I/O, data retrieval, and agent directives, streamlining the integration of language models into applications.

Semantic Kernel SDK

The Semantic Kernel SDK, available in C#, Python, and Java, is a versatile tool for integrating Large Language Models (LLMs) such as OpenAI, Azure OpenAI, and Hugging Face with conventional programming languages. It stands out for its ability to orchestrate AI-driven plugins automatically. Semantic Kernel planners enable LLMs to generate plans tailored to specific user goals and then execute these plans. A key feature of Semantic Kernel is its treatment of AI service calls as "semantic functions," which are on par with native code functions. Semantic Kernel facilitates adding and swapping different AI services, allowing users to choose the most suitable model for their needs. While Semantic Kernel is at the heart of Microsoft's plugin architecture, it's well suited for building the low-level building blocks of AI-centric software.

Prompt Flow

Prompt Flow is a suite of tools designed to streamline the development cycle of LLM-based AI applications, from ideation to deployment. It facilitates prompt engineering and ensures production-quality LLM applications. The framework enables the creation of executable workflows linking LLMs, prompts, and code. It also provides tools for debugging, evaluating, and deploying these workflows, ensuring their integration into CI/CD systems.

AIP Logic/Automate

Palntir's Artificial Intelligence Platform (AIP) includes two products that help organizations build and deploy automated AI agents: AIP Logic and Automate. Logic allows you to assemble prompts, tools (including custom tools), and workflows to create a custom agent that can then be published and orchestrated through AIP Automate. Logic also includes tools to test and deploy your agents. AIP Automate can orchestrate agent executions responding to events such as data mutations or notifications. Palantir's AIP platform is differentiated in its ability to allow nontechnical users to build and deploy AI agents using Logic and Automate. Palantir's AIP Bootcamps demonstrate how easy it is for business users to take advantage of these tools, famously going from zero to production in just a few days.

AI Centric Advantages

By reducing reliance on humans and manual processes, AI can take a more central role in analysis and task execution. Shifting towards AI-centric software with frameworks like AutoGen, LangChain, Semantic Kernel, Prompt Flow, or platforms like AIP can help minimize costs and effectively manage issues like rate limits in several ways:

Efficiency in Resource Utilization: These frameworks reduce the need for extensive human intervention by automating and streamlining workflows. The rate of API calls can be reduced by minimizing unnecessary interactions and focusing on essential tasks. This is particularly important for services with rate limits, as it ensures that available resources are used more judiciously.
Compounding Returns: These frameworks' modular and reusable components accelerate development. This reduces the time and effort required for developing and maintaining AI applications, translating into cost savings. Returns will compound over time as more and more workflows can be assembled from existing resources.
Augmenting Human Intelligence: By using humans as a gating function, we can amplify the value of human actors, enhancing their ability to make faster decisions with less cognitive load. This effectively allows AI to augment human intelligence while maximizing the value of human and AI actors. This human gating function may be removed at some point in the near future, greatly accelerating the pace of AI decision-making.

Conclusions

Companies that put AI-centric approaches at the heart of their strategy may have an opportunity to leapfrog competitors who largely look to leverage CoPilots. CoPilots, while a big productivity booster, are not a step change in operations. The same software and workflows will remain at the center of the enterprise. This might disproportionately benefit software companies that own those pixels, not the enterprises that lease them.

Companies that use today's software and underlying service mesh as a tool that AI can effectively leverage will be able to reinvent their operations and drive costs down substantially. If economic growth remains slow or starts to contract, the productivity gains from CoPiltos may be detrimental to the bottom line due to the associated increased costs combined with slow growth due to macroeconomic conditions.

AI-centric approaches will need technology partners and platforms to enable this transition. To effectively leverage AI, companies must unify their data, infuse it with the required semantics, model their business as code, and enable technical and nontechnical users in a unified ecosystem. They will also need to address challenges related to audit trials in AI decision-making, ensuring data and code are correct and safe to execute and comply with newly enacted regulations. Palantir remains at the top of this very short list. This is primarily driven by their CTO, Shyam Sankar, who has a compelling vision about how AI and humans will work together. Palantir also benefits from its work in defense and intelligence, where AI has been used for some time. Microsoft is also highly compelling with services like Azure AI Studio, Semantic Kernel, and Fabric (which recently became generally available). One thing is clear: companies better keep their eye on the ball. This rapidly changing landscape requires a lot of diligence to navigate.