Prompt Caching

Definition

Prompt caching stores the computed key-value representations of prompt prefixes to avoid reprocessing identical content.

How It Works: 1. First request: Full processing, cache stored 2. Subsequent requests: Reuse cached computation 3. Only process new/changed content

Benefits: - 50-90% cost reduction - Significantly lower latency - Better for long system prompts - Efficient multi-turn conversations

Provider Support: - Anthropic: Prompt caching (explicit) - OpenAI: Automatic caching - Google: Context caching

Best Use Cases: - Long, static system prompts - Document Q&A (same doc, many questions) - Few-shot examples (static examples) - RAG with stable context

Examples

Caching a 50-page legal document to answer unlimited questions about it at reduced cost.

Related Terms

Inference

Using a trained AI model to make predictions on new, unseen data.

API (Application Programming Interface)

A way for software applications to communicate and share data with each other.

Context Caching

Storing computed representations to avoid reprocessing unchanged context.

Definition

Examples

Related Terms

Want more AI knowledge?

Discussion