Send a chat completion request through the Respan gateway. Supports 250+ models across OpenAI, Anthropic, Google, Azure, and more with automatic logging, fallbacks, caching, and prompt management.
Accepts all OpenAI chat completion parameters. Respan-specific parameters can be passed three ways:
respan_params - explicit namespacing to avoid conflictsX-Data-Respan-Params - base64-encoded JSON headerMerge order: top-level body fields > respan_params > header.
Legacy compatibility:
keywordsai_params is still accepted and merged into respan_paramsX-Data-Keywordsai-Params is still accepted and auto-renamed internallyWhen using the OpenAI SDK, pass Respan parameters via extra_body.
Bearer token. Use Bearer YOUR_API_KEY.
Base64-encoded JSON object of Respan parameters. Legacy X-Data-Keywordsai-Params is still accepted.
Pin the request to a specific provider without changing the model slug. Example: vertex_ai routes a claude-sonnet-4-5-20250929 request to Vertex AI Claude.
Comma-separated beta feature flags. Available: token-breakdown-2026-03-26, env-scoped-integrations-2026-03-28
Array of messages in the conversation. Each message has role (system, user, assistant, tool) and content.
Stream back partial progress token by token as server-sent events.
Controls tool selection. "none" = no tools, "auto" = model decides, or specify a tool object.
Penalizes tokens based on frequency in text so far (-2 to 2).
Sampling temperature (0-2). Higher = more random.
Number of completions to generate. Note: costs multiply with n.
Penalizes tokens already present in text (-2 to 2).
Output format. Set {"type": "json_schema", "json_schema": {...}} for structured output, or {"type": "json_object"} for JSON mode.
Load balance group selection. Use {"group_id": "..."} to route through a configured group.
Backup models (ranked by priority) if the primary model fails.
Per-customer LLM provider credentials. Keys are provider names, values are API keys.
One-off credential overrides per provider. Overrides uploaded provider keys for this request only.
Enable response caching. See Caching.
Cache time-to-live in seconds. Default: 30 days.
Cache behavior options. Properties: cache_by_customer, is_cached_by_model, omit_log.
Retry config. Properties: retry_enabled (boolean, required), num_retries (number), retry_after (seconds to wait).
When true, omits input/output from the log. Metrics (tokens, cost, latency) are still recorded.
Custom key-value metadata attached to the span.
Extended customer info. Properties: customer_identifier (required), group_identifier, name, email, period_budget, budget_duration (daily/weekly/monthly), total_budget, markup_percentage.
User feedback. true = liked, false = disliked.
Inline load balancing options. Each item can include model, weight, and optional credentials.
Conversation thread ID. Spans with the same thread_identifier are grouped together.
Typed metadata preserving native types (numbers, booleans, nested objects). Unlike metadata which coerces to strings.
Model to use. See Models for available options.
Prompt template config. Properties: prompt_id (required), variables (template variables), version (number, or "latest" for draft), echo (return rendered prompt), override (use override_params), override_params (OpenAI params to override), schema_version (1 = legacy, 2 = prompt config wins). See Prompt management.