Send a chat completion request through the Respan gateway. Supports 250+ models across OpenAI, Anthropic, Google, Azure, and more with automatic logging, fallbacks, caching, and prompt management.
Accepts all [OpenAI chat completion parameters](https://platform.openai.com/docs/apis/chat). Respan-specific parameters can be passed three ways:
1. **Top-level body fields** - add directly to the request body
2. **Nested under `respan_params`** - explicit namespacing to avoid conflicts
3. **Header `X-Data-Respan-Params`** - base64-encoded JSON header
Merge order: top-level body fields > `respan_params` > header.
Legacy compatibility:
- `keywordsai_params` is still accepted and merged into `respan_params`
- `X-Data-Keywordsai-Params` is still accepted and auto-renamed internally
When using the OpenAI SDK, pass Respan parameters via `extra_body`.
Request
This endpoint expects an object.
messageslist of objectsRequired
Array of messages in the conversation. Each message has role (system, user, assistant, tool) and content.
modelstringRequired
Model to use. See [Models](https://platform.respan.ai/platform/models) for available options.
streambooleanOptional
Stream back partial progress token by token as server-sent events.
toolslist of objectsOptional
Tools the model may call. Currently only functions are supported.
tool_choiceobjectOptional
Controls tool selection. "none" = no tools, "auto" = model decides, or specify a tool object.
frequency_penaltydoubleOptional
Penalizes tokens based on frequency in text so far (-2 to 2).
max_tokensdoubleOptional
Maximum tokens to generate.
temperaturedoubleOptionalDefaults to 1
Sampling temperature (0-2). Higher = more random.
ndoubleOptionalDefaults to 1
Number of completions to generate. Note: costs multiply with n.
logprobsbooleanOptional
Return log probabilities of output tokens.
echobooleanOptional
Echo back the prompt in addition to the completion
stoplist of stringsOptional
Stop sequences where generation halts.
presence_penaltydoubleOptional
Penalizes tokens already present in text (-2 to 2).
logit_biasobjectOptional
Used to modify the probability of tokens appearing in the response
response_formatobjectOptional
Output format. Set {"type": "json_schema", "json_schema": {...}} for structured output, or {"type": "json_object"} for JSON mode.
parallel_tool_callsbooleanOptional
Enable parallel function calling during tool use.
load_balance_groupobjectOptional
Load balance group selection. Use {"group_id": "..."} to route through a configured group.
fallback_modelslist of stringsOptional
Backup models (ranked by priority) if the primary model fails.
customer_credentialsobjectOptional
Per-customer LLM provider credentials. Keys are provider names, values are API keys.
credential_overrideobjectOptional
One-off credential overrides per provider. Overrides uploaded provider keys for this request only.
cache_enabledbooleanOptional
Enable response caching. See Caching.
cache_ttldoubleOptional
Cache time-to-live in seconds. Default: 30 days.
cache_optionsobjectOptional
Cache behavior options. Properties: cache_by_customer, is_cached_by_model, omit_log.
promptobjectOptional
Prompt template config. Properties: `prompt_id` (required), `variables` (template variables), `version` (number, or `"latest"` for draft), `echo` (return rendered prompt), `override` (use override_params), `override_params` (OpenAI params to override), `schema_version` (`1` = legacy, `2` = prompt config wins). See [Prompt management](/docs/documentation/features/prompt-management/advanced).
retry_paramsobjectOptional
Retry config. Properties: retry_enabled (boolean, required), num_retries (number), retry_after (seconds to wait).
disable_logbooleanOptional
When true, omits input/output from the log. Metrics (tokens, cost, latency) are still recorded.
model_name_mapobjectOptional
Azure deployment name mapping. Maps your custom Azure deployment names to standard model names.
modelslist of stringsOptional
Model list for LLM router selection.
exclude_providerslist of stringsOptional
Providers to exclude from routing. All models under excluded providers are skipped.
exclude_modelslist of stringsOptional
Specific models to exclude from routing.
metadataobjectOptional
Custom key-value metadata attached to the span.
custom_identifierstringOptional
Indexed custom tag for fast querying.
customer_identifierstringOptional<=254 characters
End user identifier for analytics and budgets.
customer_paramsobjectOptional
Extended customer info. Properties: customer_identifier (required), group_identifier, name, email, period_budget, budget_duration (daily/weekly/monthly), total_budget, markup_percentage.
request_breakdownbooleanOptional
Return response metrics summary in the response body. For streaming, metrics appear in the final chunk.
positive_feedbackbooleanOptional
User feedback. true = liked, false = disliked.
load_balance_modelslist of objectsOptional
Inline load balancing options. Each item can include model, weight, and optional credentials.
thread_identifierstringOptional
Conversation thread ID. Spans with the same thread_identifier are grouped together.
propertiesobjectOptional
Typed metadata preserving native types (numbers, booleans, nested objects). Unlike metadata which coerces to strings.
retriesintegerOptionalDefaults to 0
Number of retries on failure.
weightdoubleOptional
Load balancing weight.
span_namestringOptional
Custom span name for tracing.
respan_paramsobjectOptional
Namespaced container for all Respan parameters. Alternative to passing them at top level.
Response
Successful response for Create chat completion
idstring
Chat completion ID.
createdinteger
Unix timestamp for when the completion was created.
modelstring
Model used for the completion.