# Pre-warm the Prompt Cache
Useful pattern to cut **time-to-first-token** on longer prompts when calling the [[Anthropic]] API: send the system prompt before the real user request lands, so Claude writes it to the cache without generating any output. When the actual user request arrives, it hits a warm cache and the first token comes back faster.
The mechanic is simple. A pre-warm call sends just the cacheable prefix (system prompt, tool definitions, long context). The model has nothing to generate, so cost is the cache-write portion. The next request that reuses the same prefix pays the cheap cache-read price *and* skips the latency of building the cache on the critical path.
## When it pays off
- The user-facing request has tight latency requirements (chat UI, voice, low-latency agent loops)
- The cacheable prefix is large enough that the cache write itself adds noticeable latency on the critical path
- You know the user is about to send a request (e.g. they opened the app, started typing, or completed a previous turn)
## When it does *not* pay off
- The cacheable prefix is small — the saved cache-build time is dwarfed by network round-trip
- You're not sure a real request will follow within the cache TTL (5 min or 1 h) — you pay the write cost for nothing
- Single-shot scripts where there is no second request to benefit from the warm cache
This is the same trade-off as [[Claude Code Prompt Caching|Claude Code's 5m vs 1h cache decision]]: a longer-TTL warm-up only pays off if the cache is actually hit within the window.
## Related primitives
- The model does no generation when you prime the cache; you're paying only for the cache-write portion of the prefix
- Works with both 5-minute and 1-hour caches; pick the TTL based on how long the user might wait before sending the real request
- Stackable with other latency tricks: streaming, prompt compression, structured output prefilling
## References
- ClaudeDevs tip (2026-05-15): https://x.com/ClaudeDevs/status/2055069548672631218
- Anthropic prompt caching docs: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
## Related
- [[Claude Code Prompt Caching]]
- [[Anthropic]]
- [[Claude]]
- [[AI KV Cache]]