Create chat completion

Send a chat completion request through the Respan gateway. Supports 250+ models across OpenAI, Anthropic, Google, Azure, and more with automatic logging, fallbacks, caching, and prompt management. Accepts all [OpenAI chat completion parameters](https://platform.openai.com/docs/apis/chat). Respan-specific parameters can be passed three ways: 1. **Top-level body fields** - add directly to the request body 2. **Nested under `respan_params`** - explicit namespacing to avoid conflicts 3. **Header `X-Data-Respan-Params`** - base64-encoded JSON header Merge order: top-level body fields > `respan_params` > header. Legacy compatibility: - `keywordsai_params` is still accepted and merged into `respan_params` - `X-Data-Keywordsai-Params` is still accepted and auto-renamed internally When using the OpenAI SDK, pass Respan parameters via `extra_body`.

Headers

AuthorizationstringRequired

Bearer token. Use Bearer YOUR_API_KEY.

X-Data-Respan-ParamsstringOptional

Base64-encoded JSON object of Respan parameters. Legacy X-Data-Keywordsai-Params is still accepted.

X-Respan-BetastringOptional

Comma-separated beta feature flags. Available: token-breakdown-2026-03-26, env-scoped-integrations-2026-03-28

Request

This endpoint expects an object.
messageslist of objectsRequired

Array of messages in the conversation. Each message has role (system, user, assistant, tool) and content.

modelstringRequired
Model to use. See [Models](https://platform.respan.ai/platform/models) for available options.
streambooleanOptional

Stream back partial progress token by token as server-sent events.

toolslist of objectsOptional
Tools the model may call. Currently only functions are supported.
tool_choiceobjectOptional

Controls tool selection. "none" = no tools, "auto" = model decides, or specify a tool object.

frequency_penaltydoubleOptional

Penalizes tokens based on frequency in text so far (-2 to 2).

max_tokensdoubleOptional
Maximum tokens to generate.
temperaturedoubleOptionalDefaults to 1

Sampling temperature (0-2). Higher = more random.

ndoubleOptionalDefaults to 1

Number of completions to generate. Note: costs multiply with n.

logprobsbooleanOptional
Return log probabilities of output tokens.
echobooleanOptional
Echo back the prompt in addition to the completion
stoplist of stringsOptional
Stop sequences where generation halts.
presence_penaltydoubleOptional

Penalizes tokens already present in text (-2 to 2).

logit_biasobjectOptional
Used to modify the probability of tokens appearing in the response
response_formatobjectOptional

Output format. Set {"type": "json_schema", "json_schema": {...}} for structured output, or {"type": "json_object"} for JSON mode.

parallel_tool_callsbooleanOptional
Enable parallel function calling during tool use.
load_balance_groupobjectOptional

Load balance group selection. Use {"group_id": "..."} to route through a configured group.

fallback_modelslist of stringsOptional

Backup models (ranked by priority) if the primary model fails.

customer_credentialsobjectOptional

Per-customer LLM provider credentials. Keys are provider names, values are API keys.

credential_overrideobjectOptional

One-off credential overrides per provider. Overrides uploaded provider keys for this request only.

cache_enabledbooleanOptional

Enable response caching. See Caching.

cache_ttldoubleOptional

Cache time-to-live in seconds.

cache_optionsobjectOptional

Cache behavior options. Properties: cache_by_customer, is_cached_by_model, omit_log.

promptobjectOptional
Prompt template config. Properties: `prompt_id` (required), `variables` (template variables), `version` (number, or `"latest"` for draft), `echo` (return rendered prompt), `override` (use override_params), `override_params` (OpenAI params to override), `schema_version` (`1` = legacy, `2` = prompt config wins). See [Prompt management](/docs/documentation/features/prompt-management/advanced).
retry_paramsobjectOptional

Retry config. Properties: retry_enabled (boolean, required), num_retries (number), retry_after (seconds to wait).

disable_logbooleanOptional

When true, omits input/output from the log. Metrics (tokens, cost, latency) are still recorded.

model_name_mapobjectOptional
Azure deployment name mapping. Maps your custom Azure deployment names to standard model names.
modelslist of stringsOptional
Model list for LLM router selection.
exclude_providerslist of stringsOptional
Providers to exclude from routing. All models under excluded providers are skipped.
exclude_modelslist of stringsOptional
Specific models to exclude from routing.
metadataobjectOptional

Custom key-value metadata attached to the span.

custom_identifierstringOptional
Indexed custom tag for fast querying.
customer_identifierstringOptional<=254 characters
End user identifier for analytics and budgets.
customer_paramsobjectOptional

Extended customer info. Properties: customer_identifier (required), group_identifier, name, email, period_budget, budget_duration (daily/weekly/monthly), total_budget, markup_percentage.

request_breakdownbooleanOptional
Return response metrics summary in the response body. For streaming, metrics appear in the final chunk.
positive_feedbackbooleanOptional

User feedback. true = liked, false = disliked.

load_balance_modelslist of objectsOptional

Inline load balancing options. Each item can include model, weight, and optional credentials.

thread_identifierstringOptional

Conversation thread ID. Spans with the same thread_identifier are grouped together.

propertiesobjectOptional

Typed metadata preserving native types (numbers, booleans, nested objects). Unlike metadata which coerces to strings.

retriesintegerOptionalDefaults to 0
Number of retries on failure.
weightdoubleOptional
Load balancing weight.
span_namestringOptional
Custom span name for tracing.
respan_paramsobjectOptional
Namespaced container for all Respan parameters. Alternative to passing them at top level.

Response

Successful response for Create chat completion
rolestring
contentlist of objects

Errors

401
Unauthorized Error