Your AI Costs Blew Up Because You Deployed an Agent. A Prompt Would Have Done It.
Uber exhausted its 2026 AI budget in four months. Sam Altman says enterprise cost complaints are now 'a meme.' The root cause isn't reckless spend — it's deploying always-on agents for tasks a single prompt could handle. Here's the three-tier framework that fixes it.
By Forge Team
The question your finance team is asking — "why did we blow through our AI budget?" — is the wrong question. The right question is one that should have been asked before any of those tools were deployed: does this task actually need an agent, or would a single prompt do it?
Uber had to ask it after the fact. Most teams will too, unless they build the habit of asking first.
Three signals, one week
Three things converged in the second week of June. Simon Willison — who documents his AI tool use in detail and publishes the numbers — burned $110 testing Claude Fable 5 for a single day (June 9). Earlier that month, his analysis surfaced that Uber had capped AI spending at $1,500 per employee per tool after exhausting its entire 2026 budget in roughly four months. The same week, Sam Altman told an enterprise audience that AI cost complaints had become "a meme" — companies contacting OpenAI to report that they'd spent their full 2026 AI budget by Q1 (June 10).
None of this is a story about companies spending carelessly. Willison was doing real work. Uber was running real workflows. The problem, as Willison's analysis identified, is a deployment decision: always-on AI agents scaled across teams before anyone modelled what continuous compute at volume actually costs.
The three-tier framework you need before you deploy
AI capability comes in three tiers, and they cost very different amounts:
Single prompt. One interaction, one output. No memory between sessions, no independent action. You send a task, you get a result. Cost: cents per task, fully predictable.
Bounded workflow. Multi-step, but constrained. The inputs are defined upfront, the steps are predetermined, there are human checkpoints. Cost: scales with volume but stays predictable because the scope is fixed.
Always-on agent. Continuous compute, independent action, ongoing monitoring or execution without you triggering each step. Cost: the highest tier, and it scales in ways that catch teams off guard when volume multiplies.
The deployment mistake that blew Uber's budget is consistent with what Willison's cost data shows: teams reach for agents because agents are capable, without asking whether the task actually requires what agents provide. Agents earn their cost when the work is genuinely continuous, genuinely requires independent action, or genuinely can't be batched. Everything else should be a workflow or a prompt.
The question to ask before any deployment: what's the simplest tier that would still do this task well?
Kate: when "we deployed an agent" was the mistake
Kate runs marketing at a 60-person B2B software company. Her team wanted competitive intelligence — tracking competitor blog posts, product updates, and social announcements. An always-on monitoring agent was the obvious solution. It checked sources continuously and surfaced daily summaries automatically.
Twelve weeks in, her AI tools bill had tripled from what she'd budgeted. The agent was doing exactly what she'd asked.
But competitor blog posts don't require immediate response. The team was reading the summaries once in the morning, then making decisions during their weekly content meeting. The work didn't need continuous monitoring. A single daily prompt — same sources, same summary format, triggered each morning on a schedule — would have produced equivalent value at roughly 1/40th the cost.
The question she should have asked before deploying the agent: does this task require the AI to act in real time, or does it just need to surface information once a day? "Once a day" is a scheduled prompt, not an agent.
For three tasks in your current AI stack, decide whether each one needs a single prompt, a bounded workflow, or an always-on agent — and what would actually break if you chose one tier simpler.
Marcus: when the right tier was a cheaper model
Marcus is an operations lead at a 200-person professional services firm. His team built an AI tool to answer internal HR questions — employee queries about policy, benefits, and process that used to go to the HR team directly. He deployed it on their most capable, most expensive model, reasoning that accuracy mattered.
It does. But internal FAQ lookup doesn't require deep reasoning. It requires accurate retrieval from a documented source. The questions — "what's the holiday carryover policy?" "when does the benefits window open?" — had unambiguous answers in a short set of HR documents. A smaller, faster model with access to those documents would produce the same accuracy at a fraction of the cost.
He switched models. Output quality: unchanged. Monthly compute cost: dropped by about 65%.
The reasoning model's strengths — handling genuine ambiguity, synthesizing competing considerations, working through novel problems — weren't what this task needed. Matching the model's capability to what the task actually requires is a separate judgment from matching the deployment tier.
For five tasks you currently send to AI, decide whether each one needs fast-and-accurate retrieval or slow-and-nuanced reasoning — and whether the model you're using actually matches what the task requires.
The budget problem that's solvable
Altman calling enterprise AI cost complaints "a meme" is a useful signal: this is widespread, not exceptional. But it's solvable, and the solution isn't to spend less on AI — it's to spend at the right tier for each task.
Prompt-level work costs prompt-level money. Agent-level work costs agent-level money. The gap between them is large enough that deploying the wrong tier, at scale, across a team, for a year, produces exactly the budget surprise Uber ran into.
Before your next AI deployment, ask one question: what's the simplest approach that would still do this task well? If the answer is a prompt, deploy a prompt.
Audit three AI deployments in your current stack — what tier is each one, what does it cost, and what's the simplest alternative that would produce equivalent output?
Like this post?
Get the next one in your inbox. Practical AI skills, no filler.