Definition
Prompt caching stores the computed key-value representations of prompt prefixes to avoid reprocessing identical content.
How It Works: 1. First request: Full processing, cache stored 2. Subsequent requests: Reuse cached computation 3. Only process new/changed content
Benefits: - 50-90% cost reduction - Significantly lower latency - Better for long system prompts - Efficient multi-turn conversations
Provider Support: - Anthropic: Prompt caching (explicit) - OpenAI: Automatic caching - Google: Context caching
Best Use Cases: - Long, static system prompts - Document Q&A (same doc, many questions) - Few-shot examples (static examples) - RAG with stable context
Examples
Caching a 50-page legal document to answer unlimited questions about it at reduced cost.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free intelligence briefs. No spam, unsubscribe anytime.