# AI API prices are rising Through 2026, each new frontier model from the three major US labs — [[OpenAI]], [[Anthropic]], [[Google]] — has shipped at a higher API price than the model it replaces. The 2023–2024 expectation that inference gets cheaper with every release has quietly reversed at the frontier. ## The evidence - **[[Gemini 3.5 Flash]]** is priced 5× (input) and ~3.6× (output) above Gemini 2.5 Flash. A Flash-tier model now costs more to run on a benchmark suite than the previous Pro tier did. - **[[GPT-5.5]]** roughly doubled standard pricing versus [[GPT-5.4]], six weeks after it. - [[Simon Willison]] reads it plainly: all three labs are "probing the price tolerance of their API customers." ## The counter-current This is a bifurcation, not a uniform rise. The open-weight frontier — [[DeepSeek v4]], [[Kimi K2.6]] — is pushing prices the other way, hard. Closed-weight frontier prices climb; open-weight releases reset the floor. The gap between the two is now the dominant variable in model choice for any workload that is not the absolute hardest reasoning step. ## Why it's happening - Capability gains cost real compute, and the labs have stopped subsidising it to buy market share - Once a model is load-bearing in production, the buyer is switching-cost-locked rather than price-sensitive — pricing power moved to the seller - The early "race to the bottom" narrative assumed commoditisation; instead the frontier kept differentiating, so the top tier now behaves like a premium good ## What follows from it - Per-token price is a moving target you cannot architect around long-term; assume the next version costs more - The verbosity dynamic compounds it — a model that emits more tokens per task is hit by the price rise twice (see the token-economy caveat in [[DeepSeek v4]] and [[Gemini 3.5 Flash]]) - [[Bring Your Own Key (BYOK)|BYOK]] and open-weight self-hosting via [[OpenRouter]] become genuine hedges, not just preferences - Tier intuition breaks: "Flash" and "mini" no longer reliably mean "cheap" - Benchmark **cost per completed task**, not per token, and treat model selection as a recurring decision — see [[AI Cost Management]] ## Related - [[AI Cost Management]] - [[AI Inference]] - [[Gemini 3.5 Flash]] - [[GPT-5.5]] - [[DeepSeek v4]] - [[Kimi K2.6]] - [[AI Open Weight Models]] - [[Bring Your Own Key (BYOK)]] - [[OpenRouter]] - [[AI Frontier Model]] - [[Large Language Models (LLMs)]] - [[Simon Willison]]