Send a chat completion request through the Respan gateway. Supports 250+ models across OpenAI, Anthropic, Google, Azure, and more with automatic logging, fallbacks, caching, and prompt management.
Accepts all [OpenAI chat completion parameters](https://platform.openai.com/docs/apis/chat). Respan-specific parameters can be passed three ways:
1. **Top-level body fields** - add directly to the request body
2. **Nested under `respan_params`** - explicit namespacing to avoid conflicts
3. **Header `X-Data-Respan-Params`** - base64-encoded JSON header
Merge order: top-level body fields > `respan_params` > header.
Legacy compatibility:
- `keywordsai_params` is still accepted and merged into `respan_params`
- `X-Data-Keywordsai-Params` is still accepted and auto-renamed internally
When using the OpenAI SDK, pass Respan parameters via `extra_body`.
Request
This endpoint expects an object.
messageslist of objectsRequired
Array of messages in the conversation. Each message has role (system, user, assistant, tool) and content.
modelstringRequired
Model to use. See [Models](https://platform.respan.ai/platform/models) for available options.
streambooleanOptional
Stream back partial progress token by token as server-sent events.
toolslist of objectsOptional
Tools the model may call. Currently only functions are supported.
tool_choiceobjectOptional
Controls tool selection. "none" = no tools, "auto" = model decides, or specify a tool object.
frequency_penaltydoubleOptional
Penalizes tokens based on frequency in text so far (-2 to 2).
max_tokensdoubleOptional
Maximum tokens to generate.
temperaturedoubleOptionalDefaults to 1
Sampling temperature (0-2). Higher = more random.
ndoubleOptionalDefaults to 1
Number of completions to generate. Note: costs multiply with n.
logprobsbooleanOptional
Return log probabilities of output tokens.
echobooleanOptional
Echo back the prompt in addition to the completion
stoplist of stringsOptional
Stop sequences where generation halts.
presence_penaltydoubleOptional
Penalizes tokens already present in text (-2 to 2).
logit_biasobjectOptional
Used to modify the probability of tokens appearing in the response
response_formatobjectOptional
Output format. Set {"type": "json_schema", "json_schema": {...}} for structured output, or {"type": "json_object"} for JSON mode.
parallel_tool_callsbooleanOptional
Enable parallel function calling during tool use.
load_balance_groupobjectOptional
Load balance group selection. Use {"group_id": "..."} to route through a configured group.
fallback_modelslist of stringsOptional
Backup models (ranked by priority) if the primary model fails.
customer_credentialsobjectOptional
Per-customer LLM provider credentials. Keys are provider names, values are API keys.
credential_overrideobjectOptional
One-off credential overrides per provider. Overrides uploaded provider keys for this request only.
cache_enabledbooleanOptional
Enable response caching. See Caching.
cache_ttldoubleOptional
Cache time-to-live in seconds.
cache_optionsobjectOptional
Cache behavior options. Properties: cache_by_customer, is_cached_by_model, omit_log.
promptobjectOptional
Prompt template config. Properties: `prompt_id` (required), `variables` (template variables), `version` (number, or `"latest"` for draft), `echo` (return rendered prompt), `override` (use override_params), `override_params` (OpenAI params to override), `schema_version` (`1` = legacy, `2` = prompt config wins). See [Prompt management](/docs/documentation/features/prompt-management/advanced).
retry_paramsobjectOptional
Retry config. Properties: retry_enabled (boolean, required), num_retries (number), retry_after (seconds to wait).
disable_logbooleanOptional
When true, omits input/output from the log. Metrics (tokens, cost, latency) are still recorded.
model_name_mapobjectOptional
Azure deployment name mapping. Maps your custom Azure deployment names to standard model names.
modelslist of stringsOptional
Model list for LLM router selection.
exclude_providerslist of stringsOptional
Providers to exclude from routing. All models under excluded providers are skipped.
exclude_modelslist of stringsOptional
Specific models to exclude from routing.
metadataobjectOptional
Custom key-value metadata attached to the span.
custom_identifierstringOptional
Indexed custom tag for fast querying.
customer_identifierstringOptional<=254 characters
End user identifier for analytics and budgets.
customer_paramsobjectOptional
Extended customer info. Properties: customer_identifier (required), group_identifier, name, email, period_budget, budget_duration (daily/weekly/monthly), total_budget, markup_percentage.
request_breakdownbooleanOptional
Return response metrics summary in the response body. For streaming, metrics appear in the final chunk.
positive_feedbackbooleanOptional
User feedback. true = liked, false = disliked.
load_balance_modelslist of objectsOptional
Inline load balancing options. Each item can include model, weight, and optional credentials.
thread_identifierstringOptional
Conversation thread ID. Spans with the same thread_identifier are grouped together.
propertiesobjectOptional
Typed metadata preserving native types (numbers, booleans, nested objects). Unlike metadata which coerces to strings.
retriesintegerOptionalDefaults to 0
Number of retries on failure.
weightdoubleOptional
Load balancing weight.
span_namestringOptional
Custom span name for tracing.
respan_paramsobjectOptional
Namespaced container for all Respan parameters. Alternative to passing them at top level.
Response
Successful response for Create chat completion