The AI cost conversation just changed
In April, Uber CTO, Praveen Neppalli Naga, revealed that the ridesharing giant had maxed out its full-year AI budget within the first few months of 2026. On Uber’s following earnings call, the company's COO, Andrew Macdonald, called the disclosure a “head-exploding moment”, and said the trade between token spend and shipped features had become “very hard to justify”.
Uber is not alone. Microsoft is cancelling internal Claude Code licences across its Experiences & Devices group. Instead, it is redirecting engineers to GitHub Copilot CLI to contain bills to $500 - $2,000 per engineer per month. At Google I/O 2026, Sundar Pichai pitched Gemini 3.5 Flash with a sharp claim: top enterprises could save over $1 billion a year by shifting 80% of workloads from rival frontier closed models such as Anthropic and OpenAI.
This is not just happening at big companies. In one recent board meeting, a portfolio founder told us they were switching off a frontier coding assistant because the bill had started to outrun the proof of value.
The message is clear. The CFO has entered the AI conversation, and they’ve brought a calculator. This means tokenmaxxing - the idea of spending an unlimited number of tokens to maximise AI output - is meeting its first real economic test.
Jevons Paradox of the prompt: Why bills are rising even as per-token prices fall
Fig. 01
The underlying issue is that the task shape has fundamentally changed. Long-context agents carry more state, while tool use adds schemas and results to the token stream. OpenAI separately prices web search calls and container sessions. The latest reasoning models consume up to 100x more tokens per request than those released just months ago. Agentic coders fan out across the same codebase in parallel. Long context windows multiply prompt size on every turn.
So, corporate bills are no longer calculated based on the number of single prompts teams use. They’re now calculated based on the number of complex tasks completed, and the per-task token count is rising far faster than the per-token price is falling.
In addition, the physical supply side of intelligence is tightening. According to SemiAnalysis, H100 one-year rentals jumped roughly 40% to $2.35 per GPU-hour in March 2026, up from $1.70 in October 2025. Wholesale capacity prices in the grid serving Northern Virginia, the world's largest datacenter hub, cleared 9.3x higher year-on-year, adding roughly $16 billion to US consumer power bills. The 2026 Strait of Hormuz crisis briefly pushed Brent crude up before ceasefire talks reversed most of the move. From raw energy costs through to GPUs, tokens, and agents, every layer of the stack is repricing upward.
A fork in the AI road
Enterprise software has always survived on a simple metric: buyers pay for measurable value created. The historic SaaS heuristic dictates that sales friction disappears when buyers see ~10x the perceived value of the contract's cost. One of the cleanest public market examples of this value-led pricing is Palantir. The company trades at a premium 70x EV/revenue today because it can convincingly promise buyers ~100x ROI.
AI now faces the same ROI test, and the early evidence is that many enterprises cannot yet draw the line from token spend to shipped value.
We still believe that GenAI will generate at least $1 trillion of enterprise revenue. The question now is how this will be achieved. We see two distinct paths to closing the cost-value gap. The paths each imply very different margin structures and very different winners. But, ultimately, we believe the companies that own the customer (the workflow, the data, the trust, the integration depth) will keep the gross margin.
In a world where models matter less than distribution, we believe that application companies that own the customer will migrate to open weights to defend gross margin. The gap between open and closed models has collapsed. Per Stanford’s AI Index Report, the gap between the lead of the top closed model over the top open model closed from 174 Arena Points (15.2%) to just 7 points by August 2024 (0.5%). The lead has widened recently to 49 points (3.5%) with recent closed frontier model releases, but nonetheless, open weight models are much more competitive today than they were a few years ago. In this period, five open model families (DeepSeek, Qwen, Kimi, GLM, Mistral) reached frontier quality essentially simultaneously.
Fig. 02
This is no longer theoretical. Last month, Cursor released Composer 2.5, their coding model for agentic software engineering. Cursor’s co-founder, Aman Sanger, confirmed on X that they used an open weights model, Moonshot’s Kimi K2.5 as the foundation. It ranks third on Artificial Analysis's Coding Agent Index at 62, four points behind Claude Opus 4.7 (66) and GPT-5.5 (65), at roughly one-sixtieth the cost per task: $0.07 versus $4.10 and $4.82. Harvey recently announced a Legal Agent Benchmark, with a landmark study that showed its fine-tuned Kimi K2.6 performed 7% better than Claude Opus 4.7, at a 10x lower cost.
A new pattern is clearly emerging across categories: ship on frontier closed models to reach product-market fit and generate evals, then move the workload to open weights served by neoclouds and inference specialists to protect margin.
If this pattern holds at scale, the gross margin pool of GenAI shifts. Today, much of an application company’s revenue flows to the model provider. In the open path, that economic surface thins for closed labs and thickens for inference providers, neoclouds, fine-tuning teams, evaluation infrastructure, and the application companies themselves. We expect the next two years to look a lot like cloud’s ‘S3 moment’, where usage explodes, unit prices fall, and the surface area of opportunity expands rather than contracts.
The counter-thesis is that frontier labs widen the value gap fast enough to keep enterprises locked in. This is not impossible. For example, Claude Opus 4.5 and GPT-5.5 currently dominate leaderboards for coding agents. Where marginal capability improvement matters more than marginal cost saving, closed models still win. We expect this to remain true in high-stakes, high-value, complex domains such as deep code generation and frontier science, and the longest-running autonomous agents.
For closed models to dominate beyond those domains, frontier labs will need value to keep accelerating exponentially while costs grow only linearly. So far, each new generation of frontier models reasons for longer and runs more agentic loops, both of which scale compute much faster than linearly. Meanwhile the underlying GPU and power costs are rising rather than falling. For this closed path to succeed in generating long-term winners, the frontier labs have to outrun all three cost rises at once.
As such, we think that closed-path winners will be those who can prove unique, defensible unit economics on the highest-value workloads, where being best matters more than being cheap.
Fig. 03
The reality
The likely outcome is heterogeneous. There will be a broad open middle (most coding, most knowledge-worker copilots, most internal tooling), where price compression pushes application companies to migrate to open models. There will be a closed frontier (the most demanding autonomous workloads, frontier science, regulated decisioning) where customers pay frontier margins because they have to. The line between the two will probably move every quarter as the model market and inference market keep shipping.
What will not change is who wins. In an era when models are increasingly substitutable and unit costs are increasingly contested, the application companies that own the customer keep the gross margin. Distribution beat product in SaaS, and we believe it will beat raw model performance in AI. Jevons was right about coal, so we believe the application companies that own the customer will be the ones who capture the surplus.
This evolving market is already creating a new map of potential winners. Inference providers and neoclouds like Runware*, Fireworks, Together.ai, and Doubleword* will win when companies want to run open-weight models cheaply and reliably. Evaluation, routing and observability companies like OpenRouter, Gimlet Labs and Callosum will win when enterprises need to decide which model (and which GPU) should handle which task, at what price, and with what confidence. Platforms like Dataiku*, Collibra*, and Quantexa* already give larger enterprises a unified way to build and operationalise AI across all of their operations. And, at the application layer, companies like Cursor, Sierra, Deepdots* and Orbio that are turning that infrastructure into customer value at the edge of the workflow will succeed. The model will always matter, it’s just that now the control plane around the model matters enormously, too.
We are excited to back the companies that pick the right architecture for their customers and their P&L, and earn the right to those customers going forward. If you are building anywhere along the AI inference stack, we’d love to hear from you. Reach out to shamillah@dawncapital.com.
*Dawn Portfolio Companies