Prompt caching | Respan Docs

Prompt caching stores the model’s intermediate computation state. The model generates diverse responses while saving computational costs, as it doesn’t need to reprocess the entire prompt from scratch.

Only available for Anthropic models through the gateway.

1 import anthropic
2 
3 client = anthropic.Anthropic(
4     base_url="https://api.respan.ai/api/anthropic/",
5     api_key="YOUR_RESPAN_API_KEY",
6 )
7 
8 message = client.messages.create(
9     model="claude-sonnet-4-20250514",
10     system=[
11       {
12         "type": "text",
13         "text": "You are an AI assistant tasked with analyzing literary works.",
14       },
15       {
16         "type": "text",
17         "text": "<the entire contents of 'Pride and Prejudice'>",
18         "cache_control": {"type": "ephemeral"}
19       }
20     ],
21     messages=[{"role": "user", "content": "Analyze the major themes."}]
22 )

For Respan-managed response caching (storing and reusing exact request/response pairs), see Respan caching.