Migrate from OpenAI to multi-model

Set up Respan

Sign up — Create an account at platform.respan.ai
Create an API key — Generate one on the API keys page
Add credits or a provider key — Add credits on the Credits page or connect your own provider key on the Integrations page

Overview

Most teams start with a single LLM provider. The Respan gateway lets you switch between 250+ models by changing one string — no code rewrites, no new SDKs. This cookbook shows how to migrate from direct OpenAI calls to a multi-model setup with automatic fallbacks.

Before: Direct OpenAI calls

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this article..."}],
)

After: Respan gateway (2-line change)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.respan.ai/api/",  # Change 1
    api_key="YOUR_RESPAN_API_KEY",           # Change 2
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this article..."}],
)

Everything else stays the same — same SDK, same parameters, same response format.

Switch models

Now you can swap models by changing the model string:

# OpenAI
response = client.chat.completions.create(model="gpt-4o", messages=messages)

# Anthropic
response = client.chat.completions.create(model="claude-sonnet-4-20250514", messages=messages)

# Google
response = client.chat.completions.create(model="gemini-2.0-flash", messages=messages)

# DeepSeek
response = client.chat.completions.create(model="deepseek-chat", messages=messages)

All models use the same OpenAI-compatible format. See the full model list.

Add fallback models

If your primary model goes down, Respan automatically retries with fallback models:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this article..."}],
    extra_body={
        "fallback_models": ["claude-sonnet-4-20250514", "gemini-2.0-flash"],
    }
)

If gpt-4o fails, Respan tries claude-sonnet-4-20250514, then gemini-2.0-flash. Your users never see an error.

Compare cost and quality

After running traffic through multiple models, use the Respan dashboard to compare:

Go to Dashboard
Use the model breakdown to compare cost, latency, and token usage per model
Filter logs by model to review output quality side-by-side

Add metadata to tag requests by use case, so you can compare model performance per feature — not just globally.

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=messages,
    extra_body={
        "metadata": {"feature": "summarization", "version": "v2"},
    }
)

Get started

Features

Admin

Security

Resources

Help & Community

Migrate from OpenAI to multi-model

Overview

Before: Direct OpenAI calls

After: Respan gateway (2-line change)

Switch models

Add fallback models

Compare cost and quality

Next steps

Gateway setup

Advanced configuration

Get started

Features

Admin

Security

Resources

Help & Community

​Overview

​Before: Direct OpenAI calls

​After: Respan gateway (2-line change)

​Switch models

​Add fallback models

​Compare cost and quality

​Next steps

Gateway setup

Advanced configuration

Overview

Before: Direct OpenAI calls

After: Respan gateway (2-line change)

Switch models

Add fallback models

Compare cost and quality

Next steps