Back to Glossary
techniques

Prompt Caching

Storing processed prompt prefixes to reduce cost and latency for repeated requests.

Share:

Definition

Prompt caching stores the computed key-value representations of prompt prefixes to avoid reprocessing identical content.

How It Works: 1. First request: Full processing, cache stored 2. Subsequent requests: Reuse cached computation 3. Only process new/changed content

Benefits: - 50-90% cost reduction - Significantly lower latency - Better for long system prompts - Efficient multi-turn conversations

Provider Support: - Anthropic: Prompt caching (explicit) - OpenAI: Automatic caching - Google: Context caching

Best Use Cases: - Long, static system prompts - Document Q&A (same doc, many questions) - Few-shot examples (static examples) - RAG with stable context

Examples

Caching a 50-page legal document to answer unlimited questions about it at reduced cost.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free intelligence briefs. No spam, unsubscribe anytime.

Discussion