Migrate from OpenAI to multi-model

Set up Respan

Sign up — Create an account at platform.respan.ai
Create an API key — Generate one on the API keys page
Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Overview

Most teams start with a single LLM provider. The Respan gateway lets you switch between 250+ models by changing one string — no code rewrites, no new SDKs. This cookbook shows how to migrate from direct OpenAI calls to a multi-model setup with automatic fallbacks.

Before: Direct OpenAI calls

1 from openai import OpenAI
2 
3 client = OpenAI(api_key="sk-...")
4 
5 response = client.chat.completions.create(
6     model="gpt-4o",
7     messages=[{"role": "user", "content": "Summarize this article..."}],
8 )

After: Respan gateway (2-line change)

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.respan.ai/api/",  # Change 1
5     api_key="YOUR_RESPAN_API_KEY",           # Change 2
6 )
7 
8 response = client.chat.completions.create(
9     model="gpt-4o",
10     messages=[{"role": "user", "content": "Summarize this article..."}],
11 )

Everything else stays the same — same SDK, same parameters, same response format.

Switch models

Now you can swap models by changing the model string:

1 # OpenAI
2 response = client.chat.completions.create(model="gpt-4o", messages=messages)
3 
4 # Anthropic
5 response = client.chat.completions.create(model="claude-sonnet-4-20250514", messages=messages)
6 
7 # Google
8 response = client.chat.completions.create(model="gemini-2.0-flash", messages=messages)
9 
10 # DeepSeek
11 response = client.chat.completions.create(model="deepseek-chat", messages=messages)

All models use the same OpenAI-compatible format. See the full model list.

Add fallback models

If your primary model goes down, Respan automatically retries with fallback models:

1 response = client.chat.completions.create(
2     model="gpt-4o",
3     messages=[{"role": "user", "content": "Summarize this article..."}],
4     extra_body={
5         "fallback_models": ["claude-sonnet-4-20250514", "gemini-2.0-flash"],
6     }
7 )

If gpt-4o fails, Respan tries claude-sonnet-4-20250514, then gemini-2.0-flash. Your users never see an error.

Compare cost and quality

After running traffic through multiple models, use the Respan dashboard to compare:

Go to Dashboard
Use the model breakdown to compare cost, latency, and token usage per model
Filter logs by model to review output quality side-by-side

Add metadata to tag requests by use case, so you can compare model performance per feature — not just globally.

1 response = client.chat.completions.create(
2     model="claude-sonnet-4-20250514",
3     messages=messages,
4     extra_body={
5         "metadata": {"feature": "summarization", "version": "v2"},
6     }
7 )

Next steps

Gateway setup

Full setup guide with all supported frameworks

Advanced configuration

Load balancing, caching, rate limits, and more