Use Respan AI gateway as a proxy for coding agents
Set up Respan
- Sign up: Create an account at platform.respan.ai
- Create an API key: Generate one on the API keys page
- Add credits or a provider key: Add credits on the Credits page or connect your own provider key on the Integrations page
Overview
CLI coding agents like Claude Code, Codex CLI, Gemini CLI, and OpenCode talk to provider APIs through environment variables or TOML config. Point those config knobs at the Respan gateway instead of the upstream provider, and every request flows through Respan.
You unlock a lot with a single config change:
- One key for everyone. Issue a Respan key per developer instead of distributing OpenAI, Anthropic, or Google keys.
- Model switching. Try GPT-5.5, Claude 4.6, or Gemini 3 from the same CLI by changing one string.
- Fallbacks, retries, and caching. Turn them on through gateway parameters without touching the agent.
- Cost tracking per developer, project, or sprint. Every request is logged with metadata you control.
- Audit trail. Every prompt and response is captured, and revoking a key immediately cuts off a leaver.
This cookbook covers the gateway setup. To also capture agent-level events such as thinking blocks, tool calls, and file edits, pair this with Trace CLI coding agents.
How it works
Each agent supports a “custom base URL” config. Respan exposes provider-compatible endpoints under one host, so the agent does not need to know it is talking to a gateway.
Authenticate with your RESPAN_API_KEY instead of the provider key. Respan looks up the upstream provider credentials from your account and forwards the request.
Use Respan AI gateway as a proxy for Claude Code
Claude Code reads ANTHROPIC_BASE_URL and ANTHROPIC_API_KEY from the environment or ~/.claude/settings.json.
Option 1: shell env vars
Add to .bashrc, .zshrc, or PowerShell $PROFILE:
ANTHROPIC_AUTH_TOKEN takes precedence over ANTHROPIC_API_KEY when both are set, so unset it first.
Option 2: settings.json (persistent)
Use ~/.claude/settings.json for global config or .claude/settings.local.json for per-project:
The empty ANTHROPIC_AUTH_TOKEN clears any inherited token from your shell or terminal app.
On first interactive launch Claude prompts: Detected a custom API key in your environment. Choose Yes. If you skipped it earlier, run /config, search for custom, and enable Use custom API key. Non-interactive claude -p and claude --print use ANTHROPIC_API_KEY automatically.
Switch Claude models
Use Respan AI gateway as a proxy for Codex CLI
Codex CLI reads provider config from ~/.codex/config.toml. Add a respan model provider entry and point model at it:
Then export the key:
env_key is the name of the environment variable holding your key, not the key itself. wire_api = "responses" tells Codex to use HTTP instead of WebSockets, which the gateway supports.
Switch Codex models
Edit the model = line in ~/.codex/config.toml:
Use Respan AI gateway as a proxy for Gemini CLI
Gemini CLI reads GOOGLE_GEMINI_BASE_URL and GEMINI_API_KEY from the environment.
Gemini API endpoint
Vertex AI endpoint
If your account is set up with a Google Cloud Vertex AI provider key, use:
Switch Gemini models
Use Respan AI gateway as a proxy for OpenCode
OpenCode talks to OpenAI-compatible endpoints.
Then run with any model the gateway exposes (use the openai/ prefix):
Switch OpenCode models
Verify
Run a single prompt with each agent and confirm the request shows up in Logs:
Each request displays the prompt, response, model, tokens, and cost. See the full model list for the 250+ models reachable through one gateway.
Reliability and cost features
Once requests flow through Respan, layer on gateway features without changing agent code. Set them per-key in the API keys settings, or per-request when you control the request body.
Tag requests for cost tracking
The CLI agents above pass requests verbatim, so the cleanest way to attribute usage is to scope the API key to a developer or team. Issue keys per-developer on the API keys page and group them by tags. The Users dashboard breaks down spend by key out of the box.
For per-session metadata such as a Jira ticket, branch, or sprint, pair this gateway setup with the Trace CLI coding agents cookbook. The respan integrate hook supports RESPAN_CUSTOMER_ID and RESPAN_METADATA env vars that attach the right tags without any agent-side support.
Combine with full tracing
The gateway captures every LLM request, but agent-level events such as thinking blocks, tool calls, and file edits live inside the agent process and never hit the network. To capture those too, add the tracing hook on top of the gateway config:
- Follow this cookbook to point the agent at
https://api.respan.ai/api/.... - Then run
respan integrate <agent>from Trace CLI coding agents.
The hook produces a parent span and the gateway’s LLM-call spans nest underneath. You see one trace per agent turn with thinking, tools, and the underlying chat.completion calls all linked.
Troubleshooting
Claude Code: requests still go to api.anthropic.com
ANTHROPIC_AUTH_TOKEN (set by claude auth login or the OAuth flow) takes precedence over ANTHROPIC_API_KEY. Run unset ANTHROPIC_AUTH_TOKEN and restart your terminal, or set it to "" in settings.json as shown above.
In an interactive session, you may also need to re-approve the custom API key. Run /config, search for custom, and enable Use custom API key.
Codex CLI: 'unknown wire_api' error
wire_api = "responses" was added in a recent Codex CLI version. Update Codex with npm i -g @openai/codex and try again. Older versions only support wire_api = "chat", which is also accepted by the gateway but only routes Chat Completions.
Gemini CLI: 401 Unauthorized
The gateway authenticates with your Respan key in GEMINI_API_KEY, not your Google API key. Confirm that echo $GEMINI_API_KEY returns a value starting with sk_.
If you are using the Vertex AI endpoint, your Respan account must have a Vertex AI provider key connected.
OpenCode: 'model not found'
OpenCode requires the openai/ provider prefix. Use -m "openai/gpt-5.5", not -m "gpt-5.5".
Next steps
Capture thinking, tool calls, and file edits with the Respan hook.
Attribute LLM costs to teams, projects, and sprints.
Switch providers with fallbacks and cost comparison.
Fallbacks, load balancing, retries, and caching.