Pre-warm the Prompt Cache

# Pre-warm the Prompt Cache Useful pattern to cut **time-to-first-token** on longer prompts when calling the [[Anthropic]] API: send the system prompt before the real user request lands, so Claude writes it to the cache without generating any output. When the actual user request arrives, it hits a warm cache and the first token comes back faster. The mechanic is simple. A pre-warm call sends just the cacheable prefix (system prompt, tool definitions, long context). The model has nothing to generate, so cost is the cache-write portion. The next request that reuses the same prefix pays the cheap cache-read price *and* skips the latency of building the cache on the critical path. ## When it pays off - The user-facing request has tight latency requirements (chat UI, voice, low-latency agent loops) - The cacheable prefix is large enough that the cache write itself adds noticeable latency on the critical path - You know the user is about to send a request (e.g. they opened the app, started typing, or completed a previous turn) ## When it does *not* pay off - The cacheable prefix is small — the saved cache-build time is dwarfed by network round-trip - You're not sure a real request will follow within the cache TTL (5 min or 1 h) — you pay the write cost for nothing - Single-shot scripts where there is no second request to benefit from the warm cache This is the same trade-off as [[Claude Code Prompt Caching|Claude Code's 5m vs 1h cache decision]]: a longer-TTL warm-up only pays off if the cache is actually hit within the window. ## Related primitives - The model does no generation when you prime the cache; you're paying only for the cache-write portion of the prefix - Works with both 5-minute and 1-hour caches; pick the TTL based on how long the user might wait before sending the real request - Stackable with other latency tricks: streaming, prompt compression, structured output prefilling ## References - ClaudeDevs tip (2026-05-15): https://x.com/ClaudeDevs/status/2055069548672631218 - Anthropic prompt caching docs: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching ## Related - [[Claude Code Prompt Caching]] - [[Anthropic]] - [[Claude]] - [[AI KV Cache]]