If you're shipping LLM features past month 6 of production, you need a gateway. Provider outages happen. Cost guardrails matter. Switching between models without redeploying app code is the difference between a 30-minute decision and a 30-day project. This is the honest list of LLM gateways to do that work in 2026 — including ours.
A note on bias: we ship Respan, so we'd rank ourselves favorably. We've tried hard to be specific about what each tool is good at, including our own weaknesses.
Quick comparison
| Gateway | Best for | Self-host | Free tier | Pricing |
|---|---|---|---|---|
| Respan | Gateway + observability + evals + prompts in one | Enterprise | Yes | $$ |
| OpenRouter | Widest model catalog, simplest integration | No | Yes | $$ |
| LiteLLM | Open-source self-host, broad model support | Yes (OSS) | Yes | $ |
| Portkey | Managed gateway with strong governance | Enterprise only | Yes | $$$ |
| Cloudflare AI Gateway | Edge-routed, low-latency | No | Yes | $ |
| Helicone | Lightweight proxy with cost analytics | Yes (OSS) | Yes | $ |
| Bifrost | Lightweight self-hostable | Yes (OSS) | Yes | Free |
| Vercel AI Gateway | Vercel-native AI applications | No | Yes | $$ |
What to evaluate
Criteria that matter:
- Models supported: count + how fast new models are added
- OpenAI-compatible drop-in: most teams already have OpenAI-format code
- Provider fallback: automatic failover between providers (e.g., Anthropic → Bedrock)
- Caching: exact-match and/or semantic
- Rate limiting / budgets: per user, per feature, per dollar
- Cost guardrails and analytics: alerting on cost spikes, attribution by feature
- Observability integration: traces, eval scores attached
- Self-host: data residency requirements
1. Respan
Best for: Teams that want gateway + observability + evals + prompts in one platform.
The story: Most gateways listed below do gateway well. The structural question is whether your stack ends up with a gateway + a separate observability tool + a separate eval tool + a separate prompt management tool — four products, four invoices, four integrations. Respan is one platform that owns all four primitives.
Pros:
- 500+ models routable through unified OpenAI-compatible API
- Provider fallback configured per request
- Exact-match + semantic caching with TTL config
- Per-user / per-feature budgets and rate limits
- Full observability + evals + prompt management built in
- ~10ms added P95 latency overhead (measured)
Cons:
- Smaller community than OpenRouter on the gateway dimension specifically
- Less battle-tested at the "10-year incumbent" scale of Cloudflare
- Self-host on Enterprise only
Pricing: Free tier with generous limits. Pro and Enterprise tiers.
→ See Respan's gateway in product
2. OpenRouter
Best for: Widest model catalog and the simplest integration.
The story: OpenRouter's distinctive value is its breadth — they support more models than anyone else, including obscure / experimental ones. The integration is dead simple: change your base URL and you have access to 300+ models.
Pros:
- Largest model catalog in the gateway market
- Simple OpenAI-compatible drop-in
- Provider fallback supported
- Strong community
Cons:
- No bundled observability or evals
- No semantic caching
- No prompt management
- No self-host
Pricing: Pay-per-use plus small markup. Free tier exists.
3. LiteLLM
Best for: Open-source self-host with broad model support.
The story: LiteLLM is the OSS LLM gateway that supports 100+ models behind a single OpenAI-compatible API. Self-hostable, configurable, popular in environments where a cloud gateway isn't an option.
Pros:
- Open source, MIT-licensed, self-hostable
- Broad model support (100+)
- Active community, fast pace of new model integration
- Free if self-hosted
Cons:
- Self-hosting is real work
- No built-in observability beyond basic logging
- No managed cloud option without third-party hosting
Pricing: Open source free. Cloud-managed offerings via partners.
4. Portkey
Best for: Managed gateway with strong governance features.
The story: Portkey is the gateway pitched at enterprises with strict governance requirements. Audit logs, role-based access control, request signing, advanced policy enforcement.
Pros:
- Enterprise governance features (audit, RBAC, SSO)
- 250+ models supported
- Provider fallback, caching, cost guardrails
- Observability built in
Cons:
- Self-host on Enterprise only
- Pricing is opaque at the upper tiers
- Less developer-friendly than OpenRouter at the entry tier
Pricing: Tiered, with Enterprise pricing custom.
5. Cloudflare AI Gateway
Best for: Edge-routed AI applications with low latency.
The story: Cloudflare's gateway runs on their edge network, so requests are routed close to the user. Tight integration with the broader Cloudflare stack (Workers, R2, D1) makes it the obvious choice if you're already on Cloudflare.
Pros:
- Edge-routed for low latency
- Tight Cloudflare integration
- Caching is mature
- Generous free tier
Cons:
- Smaller model catalog than OpenRouter or Respan (50+ models)
- Most useful inside the Cloudflare ecosystem
- Less depth on observability than dedicated tools
- No self-host (Cloudflare-managed only)
Pricing: Pay-per-request with generous free tier.
6. Helicone
Best for: Teams that want a lightweight cost gateway with proxy installation.
The story: Helicone started as an observability tool and added gateway capabilities. The proxy mode is the easiest install of any gateway on this list (one base URL change).
Pros:
- Easiest install — proxy mode requires no SDK changes
- Strong cost analytics
- Open source self-host available
- Good free tier
Cons:
- Less depth on agent tracing (proxy can't see agent state)
- Smaller model catalog than OpenRouter or LiteLLM
- Prompt management is basic
Pricing: Generous free tier. Pro and Enterprise tiers reasonable.
7. Bifrost
Best for: Teams wanting a lightweight self-hostable open-source gateway.
The story: Bifrost is a newer open-source gateway focused on simplicity and self-hostability. Smaller in scope than LiteLLM, easier to deploy, less feature surface.
Pros:
- Open source
- Lightweight, easy to deploy
- Active development
Cons:
- Smaller community
- Less feature-rich than LiteLLM or Respan
- Newer / less battle-tested
Pricing: Free open source.
8. Vercel AI Gateway
Best for: Teams already deeply on Vercel building AI-native applications.
The story: Vercel's gateway is designed for AI applications running on Vercel's platform. Tight integration with the Vercel AI SDK and Vercel's broader infra.
Pros:
- First-party for Vercel-deployed apps
- Tight Vercel AI SDK integration
- Edge-routed via Vercel network
Cons:
- Best inside the Vercel ecosystem; awkward outside
- Smaller model catalog than OpenRouter or Respan
- Newer product
Pricing: Tiered with Vercel's broader pricing.
How to choose
Quick decision framework:
- Want gateway + observability + evals + prompts in one? → Respan
- Need maximum model variety, OpenAI-compatible drop-in? → OpenRouter
- Need open-source self-host? → LiteLLM (most mature) or Bifrost (lighter)
- Need enterprise governance? → Portkey
- Already on Cloudflare? → Cloudflare AI Gateway
- Want fastest install with proxy? → Helicone
- Already on Vercel? → Vercel AI Gateway
Common mistakes
- Skipping a gateway "for now" — you'll need one within 6 months and migrating production traffic later is painful.
- No fallback configured — half a gateway. Configure provider fallback on day one.
- Semantic cache enabled by default — wrong cache hit ships stale answers. Start with exact-match, validate semantic before enabling.
- No per-feature cost guardrails — the first runaway agent drains your monthly budget in 2 hours.
- Treating the gateway as the place for routing logic — keep model-routing rules in your app, not buried in gateway config.
FAQ
Why do I need a gateway? Provider outages happen, cost guardrails matter, and switching models without app code changes is the difference between a 30-minute decision and a 30-day project. See our LLM Gateway pillar.
Can I just call providers directly? For toy projects, yes. For production, no — you'll lose to the first 25-minute Anthropic outage that takes your customer support agent offline.
Does a gateway add latency? A well-designed gateway adds 5-15ms P95 overhead. With caching enabled, the gateway often reduces median latency because cache hits return in single-digit milliseconds.
Which has the largest model catalog? OpenRouter (300+ models). Respan and Portkey are close (250-500+).
Which has the best free tier? Cloudflare AI Gateway and Respan both have generous free tiers. Helicone is competitive.
Should I self-host or use cloud? Cloud for most teams (less ops burden). Self-host if you have data residency or compliance requirements that block cloud.
Can I switch gateways later? Yes — if the gateway is OpenAI-compatible (most are), you change a base URL. Lock-in risk is highest with proprietary SDKs and lowest with OpenAI-compatible interfaces.