Create chat completion

Send a chat completion request through the Respan gateway. Supports 250+ models across OpenAI, Anthropic, Google, Azure, and more with automatic logging, fallbacks, caching, and prompt management.

Accepts all OpenAI chat completion parameters. Respan-specific parameters can be passed three ways:

Top-level body fields - add directly to the request body
Nested under respan_params - explicit namespacing to avoid conflicts
Header X-Data-Respan-Params - base64-encoded JSON header

Merge order: top-level body fields > respan_params > header.

Legacy compatibility:

keywordsai_params is still accepted and merged into respan_params
X-Data-Keywordsai-Params is still accepted and auto-renamed internally

When using the OpenAI SDK, pass Respan parameters via extra_body.

Send a chat completion request through the Respan gateway. Supports 250+ models across OpenAI, Anthropic, Google, Azure, and more with automatic logging, fallbacks, caching, and prompt management. Accepts all [OpenAI chat completion parameters](https://platform.openai.com/docs/apis/chat). Respan-specific parameters can be passed three ways: 1. **Top-level body fields** - add directly to the request body 2. **Nested under `respan_params`** - explicit namespacing to avoid conflicts 3. **Header `X-Data-Respan-Params`** - base64-encoded JSON header Merge order: top-level body fields > `respan_params` > header. Legacy compatibility: - `keywordsai_params` is still accepted and merged into `respan_params` - `X-Data-Keywordsai-Params` is still accepted and auto-renamed internally When using the OpenAI SDK, pass Respan parameters via `extra_body`.

Authentication

AuthorizationBearer

Use your Respan API key for Respan API authentication. Enter only the Respan API key value; clients send Authorization: Bearer <RESPAN_API_KEY>. For /api/responses, OpenAI or Azure OpenAI provider credentials go in Settings -> Providers or the request body credential_override field, not in this auth field.

Request

This endpoint expects an object.

messageslist of objectsRequired

Array of messages in the conversation. Each message has role (system, user, assistant, tool) and content.

modelstringRequired

Model to use. See Models for available options.

Model to use. See [Models](https://platform.respan.ai/platform/models) for available options.

streambooleanOptional

Stream back partial progress token by token as server-sent events.

toolslist of objectsOptional

Tools the model may call. Currently only functions are supported.

tool_choiceobjectOptional

Controls tool selection. "none" = no tools, "auto" = model decides, or specify a tool object.

frequency_penaltydoubleOptional

Penalizes tokens based on frequency in text so far (-2 to 2).

max_tokensdoubleOptional

Maximum tokens to generate.

temperaturedoubleOptionalDefaults to 1

Sampling temperature (0-2). Higher = more random.

ndoubleOptionalDefaults to 1

Number of completions to generate. Note: costs multiply with n.

logprobsbooleanOptional

Return log probabilities of output tokens.

echobooleanOptional

Echo back the prompt in addition to the completion

stoplist of stringsOptional

Stop sequences where generation halts.

presence_penaltydoubleOptional

Penalizes tokens already present in text (-2 to 2).

logit_biasobjectOptional

Used to modify the probability of tokens appearing in the response

response_formatobjectOptional

Output format. Set {"type": "json_schema", "json_schema": {...}} for structured output, or {"type": "json_object"} for JSON mode.

parallel_tool_callsbooleanOptional

Enable parallel function calling during tool use.

load_balance_groupobjectOptional

Load balance group selection. Use {"group_id": "..."} to route through a configured group.

fallback_modelslist of stringsOptional

Backup models (ranked by priority) if the primary model fails.

customer_credentialsobjectOptional

Per-customer LLM provider credentials. Keys are provider names, values are API keys.

credential_overrideobjectOptional

One-off credential overrides per provider. Overrides uploaded provider keys for this request only.

cache_enabledbooleanOptional

Enable response caching. See Caching.

cache_ttldoubleOptional

Cache time-to-live in seconds. Default: 30 days.

cache_optionsobjectOptional

Cache behavior options. Properties: cache_by_customer, is_cached_by_model, omit_log.

promptobjectOptional

Prompt template config. Properties: prompt_id (required), variables (template variables), version (number, or "latest" for draft), echo (return rendered prompt), override (use override_params), override_params (OpenAI params to override), schema_version (1 = legacy, 2 = prompt config wins). See Prompt management.

Prompt template config. Properties: `prompt_id` (required), `variables` (template variables), `version` (number, or `"latest"` for draft), `echo` (return rendered prompt), `override` (use override_params), `override_params` (OpenAI params to override), `schema_version` (`1` = legacy, `2` = prompt config wins). See [Prompt management](/docs/documentation/features/prompt-management/advanced).

retry_paramsobjectOptional

Retry config. Properties: retry_enabled (boolean, required), num_retries (number), retry_after (seconds to wait).

disable_logbooleanOptional

When true, omits input/output from the log. Metrics (tokens, cost, latency) are still recorded.

model_name_mapobjectOptional

Azure deployment name mapping. Maps your custom Azure deployment names to standard model names.

modelslist of stringsOptional

Model list for LLM router selection.

exclude_providerslist of stringsOptional

Providers to exclude from routing. All models under excluded providers are skipped.

exclude_modelslist of stringsOptional

Specific models to exclude from routing.

metadataobjectOptional

Custom key-value metadata attached to the span.

custom_identifierstringOptional

Indexed custom tag for fast querying.

customer_identifierstringOptional<=254 characters

End user identifier for analytics and budgets.

customer_paramsobjectOptional

Extended customer info. Properties: customer_identifier (required), group_identifier, name, email, period_budget, budget_duration (daily/weekly/monthly), total_budget, markup_percentage.

request_breakdownbooleanOptional

Return response metrics summary in the response body. For streaming, metrics appear in the final chunk.

positive_feedbackbooleanOptional

User feedback. true = liked, false = disliked.

load_balance_modelslist of objectsOptional

Inline load balancing options. Each item can include model, weight, and optional credentials.

thread_identifierstringOptional

Conversation thread ID. Spans with the same thread_identifier are grouped together.

propertiesobjectOptional

Typed metadata preserving native types (numbers, booleans, nested objects). Unlike metadata which coerces to strings.

retriesintegerOptionalDefaults to 0

Number of retries on failure.

weightdoubleOptional

Load balancing weight.

span_namestringOptional

Custom span name for tracing.

respan_paramsobjectOptional

Namespaced container for all Respan parameters. Alternative to passing them at top level.

Response

Successful response for Create chat completion

idstring

Chat completion ID.

objectstring

createdinteger

Unix timestamp for when the completion was created.

modelstring

Model used for the completion.

choiceslist of objects

usageobject

Errors

401

Unauthorized Error