Advanced configurations

Prompt schema, deployment and versioning, fallback models, streaming, and logging for prompt templates.

We recommend installing the Respan MCP so your AI coding tool can work with your prompts, logs, and traces directly. Authenticate with your Respan API key from the API keys page:

$claude mcp add \
> --transport http \
> --header "Authorization: Bearer YOUR_RESPAN_API_KEY" \
> respan \
> https://mcp.respan.ai/mcp

See the MCP docs for OAuth setup and other clients.

New to prompts? Start with the Quickstart. For more, see Prompt composition, Structured output, and Tool calling.

Prompt schema

The prompt object supports a schema_version field that controls how prompt configuration and request-body parameters are merged.

Set schema_version=2 for the recommended merge behavior:

  • Prompt configuration always wins for conflicting fields (no override flag needed).
  • Uses prepend/instructions-style merging depending on the endpoint mode.
  • Supports a patch field for applying additional parameter overrides. The patch object must not contain messages or input.

Important: OpenAI SDKs strip fields like schema_version, patch, and prompt_slug during validation. Prompt schema v2 requires raw HTTP requests (e.g., requests in Python or fetch in TypeScript).

1import requests
2
3headers = {
4 'Content-Type': 'application/json',
5 'Authorization': 'Bearer YOUR_RESPAN_API_KEY',
6}
7
8data = {
9 'prompt': {
10 'prompt_id': 'YOUR_PROMPT_ID',
11 'schema_version': 2,
12 'variables': {
13 'animal': 'octopus',
14 },
15 'patch': {
16 'temperature': 0.9,
17 'max_tokens': 500
18 }
19 }
20}
21
22response = requests.post(
23 'https://api.respan.ai/api/chat/completions',
24 headers=headers,
25 json=data
26)
27print(response.json())

Prompt schema v1 (default, legacy)

When schema_version is absent or 1, merging is controlled by the override flag:

  • override=true: prompt configuration wins for all conflicting fields.
  • override=false (default): request body wins for conflicting fields.
1request_body = {
2 "prompt": {
3 "prompt_id": "YOUR_PROMPT_ID",
4 "override": True,
5 "override_params": {
6 "temperature": 0.8,
7 "max_tokens": 150,
8 "model": "gpt-5.5"
9 }
10 }
11}

Append new messages to the end of existing prompt messages:

1request_body = {
2 "prompt": {
3 "prompt_id": "YOUR_PROMPT_ID",
4 "override_config": {"messages_override_mode": "append"},
5 "override_params": {"messages": [{"role": "user", "content": "Additional context"}]},
6 }
7}

Replace all existing prompt messages:

1request_body = {
2 "prompt": {
3 "prompt_id": "YOUR_PROMPT_ID",
4 "override_config": {"messages_override_mode": "override"},
5 "override_params": {"messages": [{"role": "user", "content": "Completely new conversation"}]},
6 }
7}

Deployment & versioning

Commit saves a new version of your prompt. Deploy makes a version live for production traffic.

Manage versions from the panel on the right side of the Editor. It has two tabs:

  • Versions lists every committed version, newest first. Each entry shows its commit message, version number (v1, v2, …), author, and timestamp. The live version is tagged Deployed, and your current uncommitted edits show as a Draft.
  • Activity shows a chronological log of changes made to the prompt.
The Editor's right-side panel showing the Versions tab: a list of committed versions with commit message, version number, author, and time, with one tagged Deployed and the latest as a Draft.

Compare versions: use the Playground to run different versions side by side.

Deploy a version: select the version you want, then click Deploy to make it live. To roll back, deploy an earlier version the same way.


Fallback models

Add fallback models to retry on a different model if the primary one fails. The gateway tries each model in order until one succeeds.

In the prompt editor’s Settings panel, click Add fallback model and select a model. Add as many as you need, then commit and deploy.

In the prompt editor Settings panel, add fallback models below the primary model with the Add fallback model button.

For fallbacks, load balancing, and retries applied at the gateway level, see the Gateway reliability docs.


Streaming

Enable streaming with the Stream toggle at the bottom of the model settings. After enabling, commit and deploy the prompt.

If you use a prompt with streaming enabled, you must also set stream=True in your SDK call:

1response = client.chat.completions.create(
2 model="placeholder",
3 messages=[{"role": "user", "content": "placeholder"}],
4 stream=True,
5 extra_body={
6 "prompt": {
7 "prompt_id": "YOUR_PROMPT_ID",
8 "override": True,
9 "variables": {"animal": "octopus"},
10 }
11 },
12)

For how streaming works at the gateway level, see the Chat Completions API reference.


Prompt logging

Log prompt usage to track performance metrics, compare versions, and analyze request distribution.

1import requests
2
3url = "https://api.respan.ai/api/request-logs/create/"
4payload = {
5 "model": "claude-sonnet-4-6",
6 "completion_message": {
7 "role": "assistant",
8 "content": "Octopuses have three hearts and blue blood."
9 },
10 "prompt": {
11 "prompt_id": "xxxxxx",
12 "variables": {
13 "animal": "octopus"
14 },
15 },
16 "generation_time": 5.7,
17 "ttft": 3.1,
18}
19headers = {
20 "Authorization": "Bearer YOUR_RESPAN_API_KEY",
21 "Content-Type": "application/json"
22}
23
24response = requests.post(url, headers=headers, json=payload)

To see the logs for a specific prompt, click Logs on the Prompt page. This opens the Spans page with the Prompt filter already set to that prompt.

The Spans page filtered by Prompt, showing only the logs for a specific prompt.

You can filter by any prompt yourself by selecting it under Prompt in the Spans filters.


Parameters reference

prompt_id
stringRequired

The unique identifier of your saved prompt template.

variables
object

Variables to inject into your prompt template. Values can be strings or typed prompt objects for composition.

1{
2 "variables": {
3 "user_name": "John",
4 "task": "summarize"
5 }
6}
override
boolean

When true, the saved prompt configuration overrides SDK parameters like model and messages.

override_params
object

Parameters that override your saved prompt configuration (temperature, max_tokens, messages, model, etc.).

override_config
object

Controls how override parameters are applied.

  • messages_override_mode: "append" (add to existing) or "override" (replace all)
schema_version
integerDefaults to 1

Controls the prompt merge strategy. 1 (default, legacy) uses override flag logic. 2 (recommended) uses prepend/instructions-style merging where the prompt config always wins. See Prompt schema.

patch
object

Additional parameter overrides applied in v2 mode (schema_version=2). Must not contain messages or input. Useful for overriding fields like temperature or max_tokens while letting the prompt config control messages and model.

fallback_models
string[]

List of fallback models to try if the primary model fails. The gateway automatically retries on the next model in the list. Your users never see the error.

1{
2 "prompt": {
3 "prompt_id": "my-prompt",
4 "schema_version": 2
5 },
6 "fallback_models": ["gpt-5.5", "claude-sonnet-4-6", "gemini-2.5-flash"]
7}
echo
boolean

When enabled, the response includes the final prompt messages used.

version
integer | string

Pin a specific prompt version. Omit for deployed version, use "latest" for newest draft.

See all Respan supported params.