Migrate from OpenAI to multi-model

Switch between LLM providers through the Respan gateway with fallbacks and cost comparison.
  1. Sign up — Create an account at platform.respan.ai
  2. Create an API key — Generate one on the API keys page
  3. Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Overview

Most teams start with a single LLM provider. The Respan gateway lets you switch between 250+ models by changing one string — no code rewrites, no new SDKs. This cookbook shows how to migrate from direct OpenAI calls to a multi-model setup with automatic fallbacks.

Before: Direct OpenAI calls

1from openai import OpenAI
2
3client = OpenAI(api_key="sk-...")
4
5response = client.chat.completions.create(
6 model="gpt-4o",
7 messages=[{"role": "user", "content": "Summarize this article..."}],
8)

After: Respan gateway (2-line change)

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://api.respan.ai/api/", # Change 1
5 api_key="YOUR_RESPAN_API_KEY", # Change 2
6)
7
8response = client.chat.completions.create(
9 model="gpt-4o",
10 messages=[{"role": "user", "content": "Summarize this article..."}],
11)

Everything else stays the same — same SDK, same parameters, same response format.

Switch models

Now you can swap models by changing the model string:

1# OpenAI
2response = client.chat.completions.create(model="gpt-4o", messages=messages)
3
4# Anthropic
5response = client.chat.completions.create(model="claude-sonnet-4-20250514", messages=messages)
6
7# Google
8response = client.chat.completions.create(model="gemini-2.0-flash", messages=messages)
9
10# DeepSeek
11response = client.chat.completions.create(model="deepseek-chat", messages=messages)

All models use the same OpenAI-compatible format. See the full model list.

Add fallback models

If your primary model goes down, Respan automatically retries with fallback models:

1response = client.chat.completions.create(
2 model="gpt-4o",
3 messages=[{"role": "user", "content": "Summarize this article..."}],
4 extra_body={
5 "fallback_models": ["claude-sonnet-4-20250514", "gemini-2.0-flash"],
6 }
7)

If gpt-4o fails, Respan tries claude-sonnet-4-20250514, then gemini-2.0-flash. Your users never see an error.

Compare cost and quality

After running traffic through multiple models, use the Respan dashboard to compare:

  1. Go to Dashboard
  2. Use the model breakdown to compare cost, latency, and token usage per model
  3. Filter logs by model to review output quality side-by-side

Add metadata to tag requests by use case, so you can compare model performance per feature — not just globally.

1response = client.chat.completions.create(
2 model="claude-sonnet-4-20250514",
3 messages=messages,
4 extra_body={
5 "metadata": {"feature": "summarization", "version": "v2"},
6 }
7)

Next steps