The Hidden Cost of AI Model Upgrades
When Anthropic updated Claude to Opus 4.7, real-world token usage jumped roughly 40% — with no pricing announcement. Here's what that means for any team running AI workflows at any scale.
By Forge Team
When your AI tools upgrade, your costs can go up — even if the pricing page says nothing changed. Earlier this month, AI newsletters TLDR AI and The Neuron flagged that Claude Opus 4.7 was using roughly 35% more tokens than the previous version for identical inputs (Apr 17). Developer Simon Willison tested this directly and put the figure at approximately 40% (Apr 20). No pricing announcement accompanied either finding. Just more tokens consumed per task than before, with nothing on the plans page to indicate a change.
If you use AI through anything connected to an API — a no-code integration in Zapier or Make.com, a workflow a colleague built, or a SaaS product that passes usage costs through to you — that 40% lands directly in your cost column.
What changed and why it matters
Token count is what AI providers actually charge for, whether that pricing is visible to you or not. Tokens are the units models use to read and write text — roughly three to four characters of English per token, though the exact mapping depends on the model version. When that mapping changes — when the underlying tokeniser is updated alongside a new model release — the same sentence can require meaningfully more tokens to process. The model does not necessarily generate longer outputs. The inputs and outputs simply cost more to handle.
Neither finding came with a corresponding product announcement. Anthropic released Opus 4.7 as a capability upgrade — 3x coding performance, higher image resolution, better instruction following. The tokeniser change was not highlighted. That is not unusual — tokeniser changes rarely are — but it creates a real information gap for anyone managing AI usage at a team level. You can read every product announcement and still miss a cost increase that appeared quietly in how the model counts words.
What to do differently from Monday
Treat a model upgrade like a price hike until you check otherwise. The practical check takes fifteen minutes:
- Find your two or three highest-volume AI workflows
- Note roughly how many tasks they run per week and how long the outputs tend to be
- Pull usage data from the tool or API dashboard — most show token consumption per request
- If a model was recently upgraded, compare the week before and the week after
You are not looking for a precise figure. You are looking for whether the number moved. If it did, you have a decision to make about whether the tasks still justify the cost at the new rate.
A real version of this problem
A head of content at a 90-person B2B SaaS uses a Make.com automation that takes new customer interview transcripts, extracts three recurring themes, and writes a short context note for each to a Notion database. It runs about 50 times a month. She built it when Claude Sonnet 4.5 was the default model her Make.com integration used, and approved a rough monthly budget based on what she saw in the first few weeks.
When Make.com updated the underlying model to Opus 4.7, the output quality improved noticeably — the theme extractions got more precise. What she did not notice immediately was that each automation run now consumed roughly 40% more tokens, pushing her monthly API costs above the threshold she had approved without a new conversation with finance.
The amount was small in absolute terms. The pattern was the problem. AI costs drifting upward without a conscious decision compounds across every workflow a team runs.
When AI costs shift, check which workflows still make sense at the new rate.
When you're on a flat monthly subscription
If you use Claude.ai, ChatGPT, or a similar flat-fee subscription, a tokeniser change does not appear on your invoice this month. The vendor absorbs it — until the economics stop working and they raise prices or tighten usage limits. That is how SaaS repricing has always worked, and AI products are no different.
The protection in this case is not tracking your token count. It is keeping your workflows specific enough that if limits tighten, you know exactly what you are relying on. An operations lead at a 25-person digital agency who has built 12 loosely defined ChatGPT workflows for client reporting is more exposed to a usage cap change than one who has 3 tightly scoped automations with explicit output constraints. The tighter workflows are also, usually, better — they generate less noise and cost the provider less to run.
Adding length constraints to your prompts — "respond in three bullet points maximum" or "keep the summary under 80 words" — is one of the simplest ways to keep token usage predictable. On API billing, it directly reduces cost. On flat subscriptions, it makes your usage easier to carry across model and pricing changes.
Set explicit output constraints — better outputs and more predictable token usage.
The line that does not appear on the pricing page
Model upgrades get announced as capability improvements. The tokenisation changes that sometimes accompany them do not. What The Neuron, TLDR AI, and Willison all noted in April is not a one-off — tokeniser changes have occurred across previous model generations and will occur again. The pricing page is not the full picture of what AI costs you. The token counter is. The teams who check before the upgrade lands have time to respond. The ones who find it on the invoice do not.
Like this post?
Get the next one in your inbox. Practical AI skills, no filler.