For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DiscordPlatform
DocumentationIntegrationsAPI referenceSDKsChangelog
DocumentationIntegrationsAPI referenceSDKsChangelog
  • Get started
    • Overview
    • Trace your first call
    • Run your first eval
    • Use gateway & prompts
    • Live demo
  • Observability
    • Users
  • Gateway
      • Respan caching
      • Prompt caching
    • Limits
  • Admin
    • API keys
    • Provider keys
    • Workspaces & projects
    • Collaborate
  • Resources
  • Security & Support
    • Support
    • Status
LogoLogo
DiscordPlatform
GatewayCaching

Prompt caching

Use Anthropic prompt caching through the Respan gateway.
Was this page helpful?
Previous

Limits

Set cost, request, and token limits per API key to control spending and usage.
Next
Built with

Prompt caching stores the model’s intermediate computation state. The model generates diverse responses while saving computational costs, as it doesn’t need to reprocess the entire prompt from scratch.

Only available for Anthropic models through the gateway.
1import anthropic
2
3client = anthropic.Anthropic(
4 base_url="https://api.respan.ai/api/anthropic/",
5 api_key="YOUR_RESPAN_API_KEY",
6)
7
8message = client.messages.create(
9 model="claude-sonnet-4-20250514",
10 system=[
11 {
12 "type": "text",
13 "text": "You are an AI assistant tasked with analyzing literary works.",
14 },
15 {
16 "type": "text",
17 "text": "<the entire contents of 'Pride and Prejudice'>",
18 "cache_control": {"type": "ephemeral"}
19 }
20 ],
21 messages=[{"role": "user", "content": "Analyze the major themes."}]
22)

For Respan-managed response caching (storing and reusing exact request/response pairs), see Respan caching.