Prompt caching
Use Anthropic prompt caching through the Respan gateway.
Prompt caching stores the model’s intermediate computation state. The model generates diverse responses while saving computational costs, as it doesn’t need to reprocess the entire prompt from scratch.
For Respan-managed response caching (storing and reusing exact request/response pairs), see Respan caching.