Provider: Replicate

Call Replicate models through Respan Gateway with unified logs, cost, and latency.
This page is for Respan LLM Gateway users.

Use Respan Gateway to call Replicate-hosted models (meta/llama-3-70b-instruct, meta/llama-3-8b-instruct, mistralai/mixtral-8x7b-instruct-v0.1, and the rest) while keeping unified observability (logs, cost, latency, reliability) in Respan.

Quick setup

1

Get a Respan API key

Sign up and create a key on the API keys page.

Send your first request

Pick the integration that matches your stack. The base URL is https://api.respan.ai/api and the only key needed is your RESPAN_API_KEY.

The Replicate SDK uses its own non-OpenAI protocol, so the cleanest way to log Replicate calls through Respan is the OpenAI-compatible gateway shown below. If you want to keep using the native Replicate client, see Log without proxying to forward usage to Respan asynchronously.

1# Native Replicate (call direct, then log to Respan, see bottom of page)
2import replicate
3
4client = replicate.Client(api_token="YOUR_REPLICATE_API_TOKEN")
5
6output = client.run(
7 "meta/meta-llama-3-70b-instruct",
8 input={"prompt": "Hello, Replicate!"},
9)
10print("".join(output))

More integrations

Replicate-hosted chat models work with every Respan gateway integration:

Switch models

Change the model parameter to call any supported model through the same client. Use the replicate/ prefix to disambiguate when routing across providers. Browse the full list on the Models page.

1client.chat.completions.create(model="replicate/meta/llama-3-70b-instruct", messages=messages)
2client.chat.completions.create(model="replicate/meta/llama-3-8b-instruct", messages=messages)
3client.chat.completions.create(model="replicate/mistralai/mixtral-8x7b-instruct-v0.1", messages=messages)
4client.chat.completions.create(model="openai/gpt-5.5", messages=messages)
5client.chat.completions.create(model="anthropic/claude-sonnet-4-5", messages=messages)

Use your own Replicate key (BYOK)

Credits are the default path. If you’d rather bill Replicate directly, attach your own provider key.

1

Open Providers

Go to the Providers page.

2

Add Replicate

Select Replicate and paste your replicate.api_key.

3

Load balancing (Optional)

Add multiple credential sets and use Load balancing weight to distribute traffic across them.

Override credentials per model (Optional)

Use credential_override when one model on a request should use a different Replicate key than the default.

1{
2 "customer_credentials": {
3 "replicate": { "api_key": "YOUR_REPLICATE_API_KEY" }
4 },
5 "credential_override": {
6 "replicate/meta/llama-3-70b-instruct": { "api_key": "ANOTHER_REPLICATE_API_KEY" }
7 }
8}

Log without proxying (Optional)

Already calling Replicate directly? Send logs to Respan asynchronously to track cost, latency, and performance for those external calls.

1import requests
2
3requests.post(
4 "https://api.respan.ai/api/request-logs/create/",
5 headers={
6 "Authorization": "Bearer YOUR_RESPAN_API_KEY",
7 "Content-Type": "application/json",
8 },
9 json={
10 "model": "replicate/meta/llama-3-70b-instruct",
11 "prompt_messages": [{"role": "user", "content": "Hello, how are you?"}],
12 "completion_message": {"role": "assistant", "content": "Hello from Replicate through Respan."},
13 "cost": 0.001,
14 "generation_time": 1.2,
15 "customer_params": {"customer_identifier": "user_123"},
16 },
17)

See the logging guide for the full setup.