Ollama (gateway)

Route Ollama model calls through the Respan gateway after the Ollama model is available in your Respan model list. For direct Ollama SDK tracing, see Ollama tracing setup.

Setup

Live verification with the tested gateway key returned model unavailable for the previous ollama/llama3.1 example. Use a model ID that is available in your Respan model list.

Install packages

$ pip install openai python-dotenv

Set environment variables

$ export RESPAN_API_KEY="YOUR_RESPAN_API_KEY"

Point an OpenAI-compatible client to the Respan gateway

1 import os
2 
3 from dotenv import load_dotenv
4 from openai import OpenAI
5 
6 load_dotenv()
7 
8 client = OpenAI(
9     api_key=os.environ["RESPAN_API_KEY"],
10     base_url=os.getenv("RESPAN_BASE_URL", "https://api.respan.ai/api"),
11 )
12 
13 response = client.chat.completions.create(
14     model="YOUR_OLLAMA_MODEL_ID",
15     messages=[{"role": "user", "content": "Say hello in three languages."}],
16 )
17 print(response.choices[0].message.content)

Switch models

Change the model parameter to use 250+ models from different providers through the same gateway.

1 response = client.chat.completions.create(model="YOUR_OLLAMA_MODEL_ID", messages=messages)
2 response = client.chat.completions.create(model="gpt-5.5", messages=messages)
3 response = client.chat.completions.create(model="claude-sonnet-4-5-20250929", messages=messages)

See the full model list.

Respan parameters

Pass additional Respan parameters via extra_body for gateway features.

1 response = client.chat.completions.create(
2     model="YOUR_OLLAMA_MODEL_ID",
3     messages=[{"role": "user", "content": "Hello"}],
4     extra_body={
5         "customer_identifier": "user_123",
6         "fallback_models": ["gpt-5.5"],
7         "metadata": {"session_id": "abc123"},
8         "thread_identifier": "conversation_456",
9     },
10 )

See Respan params & metadata for the full list.