# AI Cost Management
Managing the financial costs of using AI systems is a critical engineering concern that scales with adoption.
## Pricing models
API providers typically charge per-token (input and output priced separately), per-request, or via subscription tiers. The cost difference between models can be 100x or more for the same task, making model choice a financial decision as much as a technical one. That decision is not stable either: [[AI API prices are rising]] across the closed-weight frontier, so it has to be revisited with each release rather than set once.
## Cost optimization strategies
- **[[Model routing]]**: route simple tasks to cheap models, complex tasks to expensive ones
- **Caching**: store and reuse responses for repeated or similar queries
- **Batching**: group requests to reduce overhead and take advantage of bulk pricing
- **Local fallback**: run [[Running AI Models Locally|local models]] via [[AI Open Weight Models]] or [[Small Language Models (SLMs)]] for tasks that don't need frontier capabilities
- **[[AI Quantization]]**: reduce model size and inference cost at acceptable quality tradeoffs
## Hidden costs
The obvious per-token price is only part of the picture. Context window waste (sending irrelevant tokens), retry loops from poor prompts, and over-provisioning (using GPT-4 class models for classification tasks) silently inflate costs. [[Context Budget]] and [[Token Budget]] discipline matters. [[Context Compression]] can reduce input costs significantly.
## ROI calculation
Measure AI spend against the value produced: time saved, quality improvement, tasks that were previously impossible. A $500/month API bill that replaces 20 hours of manual work is cheap. A $50/month bill generating garbage is expensive.
## References
## Related
- [[AI API prices are rising]]
- [[Model routing]]
- [[Running AI Models Locally]]
- [[AI Open Weight Models]]
- [[Small Language Models (SLMs)]]
- [[AI Quantization]]
- [[Context Budget]]
- [[Token Budget]]
- [[Context Compression]]