LiteLLM is the open-source default for routing LLM calls across providers. It is well-built, free, and ubiquitous. So the obvious question whenever someone says "we should put a gateway in front of our LLM calls" is: do we need a real LLM gateway, or is LiteLLM enough?

The honest answer is "it depends, and the threshold is more concrete than people think." This piece is the comparison written from the perspective of teams that have run both. What LiteLLM gives you, where it stops, what a full LLM gateway adds, and the specific signals that tell you which one you actually need today.

TL;DR

LiteLLM is an open-source Python library plus optional proxy server. It translates OpenAI-format calls to about 100 provider SDKs and adds basic routing, retries, and rate limiting. Self-hosted, free, you own the operations.
A full LLM gateway (Respan, Cloudflare AI Gateway, Helicone, hosted LiteLLM tiers, OpenRouter at the marketplace end) adds observability, prompt management, structured caching, eval automations, per-customer cost tracking, and an SLA. Managed or self-hosted depending on the vendor.
LiteLLM is the right call when: your traffic is low to moderate, your team is comfortable running Python infrastructure, and your needs end at "call multiple providers with retries." Below ~5 engineers, LiteLLM is usually the right starting place.
A full gateway becomes the right call when: you need per-customer cost breakdowns, eval-driven routing, prompt management with versioning, or a managed SLA instead of an on-call rotation. Most teams hit this threshold somewhere between $5K/mo LLM spend and the first executive question about "what is the AI costing us per feature."

What LiteLLM actually is

Two things, packaged together.

The Python library (pip install litellm) is a translation layer. You call litellm.completion(...) with OpenAI-format arguments, it speaks to whichever provider you specified (Anthropic, Bedrock, Vertex, Mistral, Together, Groq, Cohere, ~100 others), and returns an OpenAI-shaped response. The value is that your application code does not need a different SDK per provider. This part is excellent and there is no reason to reinvent it.

The proxy server is a separate piece. You run it as a Docker container (or via litellm --proxy) and your application calls the proxy at an OpenAI-compatible endpoint instead of calling each provider directly. The proxy then adds:

Load balancing across multiple deployments of the same model
Retries and provider fallbacks
Virtual API keys with per-key budget and rate limits
Cost tracking and logs
Basic dashboards

The proxy is what people usually mean when they say "we use LiteLLM as our gateway."

What a full LLM gateway adds on top

The capabilities that matter once your LLM usage is real, not a prototype.

Observability that connects to traces. LiteLLM logs each call. A full gateway captures spans that link to the parent agent or RAG trace, so you can see the LLM call in context (which user, which feature, which retrieval step that produced its context, which downstream eval scored it). The difference is whether you can debug a bad answer in 5 minutes by reading the trace tree or whether you spend an hour joining logs by hand.

Prompt management with versioning. Prompts live in a registry, not in your source code. Edit a prompt, ship the new version without a redeploy, A/B test prompt variants on production traffic, roll back when a regression appears. LiteLLM does not ship this.

Online evals on production traffic. A small sampled fraction of traffic gets scored by an LLM judge automatically. Scores attach to the same traces. Drift fires an alert before users complain. LiteLLM does not ship this either.

Per-customer cost and rate tracking. "How much did customer X cost us this month across all features" is a single query, not a data-engineering project. LiteLLM has the data but does not expose the views you actually need to answer that question for a board meeting.

Structured caching that the gateway can use. Exact-match cache, semantic cache, provider prompt cache all wired in, with hit rate metrics in the same dashboard as your spend.

An SLA. Managed gateways come with uptime guarantees and a paging contact. Your self-hosted LiteLLM proxy has the uptime your team can give it.

For the full breakdown of which caches matter and when, see LLM cache layers.

The decision: when LiteLLM is enough

LiteLLM is enough when all of these are true:

Your LLM spend is under roughly $3K-5K per month.
Your team can run a Python service in production without it being a distraction.
You do not need per-customer cost breakdowns or a prompt registry for non-engineers.
Your observability needs are met by basic per-call logs.
You are not paying the price for an on-call rotation on your gateway specifically.

If all five hold, install LiteLLM, point your code at the proxy, and move on. The team behind it does good work and the integration surface is genuinely strong.

The decision: when a full gateway is the right call

A full gateway becomes the right call when any of these are true:

Cost is a board-level question. "What is the AI costing us per feature, per customer, per quarter" needs to be a query, not a CSV export.
You are running 10+ LLM features. Maintaining prompt versions in code across features starts costing engineering time and the right place for that is a prompt registry.
You have an eval and you want to run it on production traffic. Online evaluation needs the gateway to sample, score, and store results connected to the original traces. Not a LiteLLM feature.
You need per-customer rate limits or budgets that change daily. LiteLLM virtual keys handle the static case. Customer-tier-aware dynamic limits are a gateway feature.
Your platform team is small. The gateway is hot-path infrastructure. If you would rather not own the uptime, the managed gateway pays for itself.
You need observability that hooks into agent traces. Reading a 50-step agent loop in a trace tree is the difference between debugging in 5 minutes and giving up.

Hitting any one of these is usually enough to move. Hitting three is the green light. We see most teams cross the threshold somewhere between $5K/mo LLM spend and the first time someone needs to answer "which customer is causing the spike."

What Respan adds (since we are one of the gateways being compared)

Respan is a managed LLM engineering platform. The gateway is one of four products that share the same data model:

Gateway with 250+ models, OpenAI-compatible endpoint, passthrough mode (BYO provider keys, no margin) or routed mode (one Respan key for everything).
Tracing that captures the gateway calls as spans inside the parent agent or RAG workflow, with auto-instrumentation for OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, Mastra, Pydantic AI, and ~30 other frameworks.
Evals with online (sampled production traffic, asynchronous judging) and offline (golden-set experiments) modes. LLM judges, code evaluators, human review.
Prompt management with versioning, A/B deployment, and a playground.

You get the LiteLLM-style multi-provider routing plus the layers LiteLLM does not ship, all in one platform with one billing relationship. The free tier covers 10K traces per month and we process 80+ trillion tokens of customer traffic.

For the broader picture, see LLM gateway and the LLM cache layers deep-dive.

Honest weak spots for both

LiteLLM:

The Python proxy adds latency. For most workloads it is negligible (single-digit milliseconds). For high-throughput agent loops it can matter and a Go-based gateway like Bifrost beats it. See our Bifrost vs LiteLLM comparison.
Observability is per-call, not per-trace. If you are running agents, you will outgrow it.
The proxy is one more service for your platform team to babysit.

A full managed gateway (any vendor):

You are paying for a layer between you and the providers. Margins exist, even when they are invisible.
The data flows through a third party. For regulated workloads this is a real consideration (compliance certifications matter; check before signing).
You depend on the vendor's uptime, not just the providers' uptime. Pick a vendor with a track record.

A migration path that works

We see this pattern repeatedly:

Phase 1 (prototype, < $1K/mo): direct provider SDKs, no gateway. Move fast.
Phase 2 ($1K-3K/mo, 2-3 LLM features): install LiteLLM. Multi-provider, retries, basic logs.
Phase 3 ($3K-10K/mo, agent and RAG features in production): add a managed gateway (Respan or equivalent). Keep LiteLLM library calls in code if you want, or migrate to the managed gateway's SDK. Begin online evals.
Phase 4 ($10K+/mo, multi-tenant product): you are in gateway-as-strategic-infrastructure territory. Per-customer cost tracking, dynamic rate limits, and eval-driven routing are now first-order requirements.

There is no shame in being at Phase 2. There is also no glory in staying at Phase 2 once you are clearly past it.

FAQ

Is Respan a fork of LiteLLM? No. We use LiteLLM internally in some places where it is the right tool, but the Respan platform is its own data model, codebase, and integration surface.

Can I run a Respan gateway on-prem like LiteLLM? Self-hosting is available for enterprise customers. Most teams use the managed cloud.

Does LiteLLM have a hosted version? Yes. LiteLLM offers a paid hosted tier. The feature gap to a full managed gateway still applies (less prompt management, lighter eval automation, observability is per-call rather than trace-aware), but if you like the LiteLLM-shaped product and want managed hosting, it is a real option.

What if I just want unified billing across providers? LiteLLM proxy plus your own provider accounts gives you that on the self-hosted side. OpenRouter gives you that on the marketplace side. See our OpenRouter vs Vercel AI Gateway comparison for the marketplace tradeoff.

Can I keep LiteLLM and add a gateway alongside? Yes, and many teams do for a transition period. Your application code calls the gateway (Respan or other) and the gateway uses LiteLLM internally for the underlying provider hop. Most managed gateways handle this without you needing to keep LiteLLM running yourself.

What's the single signal that tells me I am past LiteLLM? Someone asked "how much did this customer cost us last month across all our LLM features" and the answer required engineering work. That is the moment.

Is OpenRouter a LiteLLM alternative? Different layer. OpenRouter is a hosted marketplace with their own billing relationship to providers. LiteLLM is OSS that you connect to your own provider accounts. They are not direct substitutes but they often come up in the same evaluation. See LiteLLM vs OpenRouter.

TL;DR

LiteLLM is an open-source Python library plus optional proxy server. It translates OpenAI-format calls to about 100 provider SDKs and adds basic routing, retries, and rate limiting. Self-hosted, free, you own the operations.
A full LLM gateway (Respan, Cloudflare AI Gateway, Helicone, hosted LiteLLM tiers, OpenRouter at the marketplace end) adds observability, prompt management, structured caching, eval automations, per-customer cost tracking, and an SLA. Managed or self-hosted depending on the vendor.
LiteLLM is the right call when: your traffic is low to moderate, your team is comfortable running Python infrastructure, and your needs end at "call multiple providers with retries." Below ~5 engineers, LiteLLM is usually the right starting place.
A full gateway becomes the right call when: you need per-customer cost breakdowns, eval-driven routing, prompt management with versioning, or a managed SLA instead of an on-call rotation. Most teams hit this threshold somewhere between $5K/mo LLM spend and the first executive question about "what is the AI costing us per feature."

What LiteLLM actually is

Two things, packaged together.

Load balancing across multiple deployments of the same model
Retries and provider fallbacks
Virtual API keys with per-key budget and rate limits
Cost tracking and logs
Basic dashboards

The proxy is what people usually mean when they say "we use LiteLLM as our gateway."

What a full LLM gateway adds on top

The capabilities that matter once your LLM usage is real, not a prototype.

Structured caching that the gateway can use. Exact-match cache, semantic cache, provider prompt cache all wired in, with hit rate metrics in the same dashboard as your spend.

An SLA. Managed gateways come with uptime guarantees and a paging contact. Your self-hosted LiteLLM proxy has the uptime your team can give it.

For the full breakdown of which caches matter and when, see LLM cache layers.

The decision: when LiteLLM is enough

LiteLLM is enough when all of these are true:

Your LLM spend is under roughly $3K-5K per month.
Your team can run a Python service in production without it being a distraction.
You do not need per-customer cost breakdowns or a prompt registry for non-engineers.
Your observability needs are met by basic per-call logs.
You are not paying the price for an on-call rotation on your gateway specifically.

If all five hold, install LiteLLM, point your code at the proxy, and move on. The team behind it does good work and the integration surface is genuinely strong.

The decision: when a full gateway is the right call

A full gateway becomes the right call when any of these are true:

Cost is a board-level question. "What is the AI costing us per feature, per customer, per quarter" needs to be a query, not a CSV export.
You are running 10+ LLM features. Maintaining prompt versions in code across features starts costing engineering time and the right place for that is a prompt registry.
You have an eval and you want to run it on production traffic. Online evaluation needs the gateway to sample, score, and store results connected to the original traces. Not a LiteLLM feature.
You need per-customer rate limits or budgets that change daily. LiteLLM virtual keys handle the static case. Customer-tier-aware dynamic limits are a gateway feature.
Your platform team is small. The gateway is hot-path infrastructure. If you would rather not own the uptime, the managed gateway pays for itself.
You need observability that hooks into agent traces. Reading a 50-step agent loop in a trace tree is the difference between debugging in 5 minutes and giving up.

What Respan adds (since we are one of the gateways being compared)

Respan is a managed LLM engineering platform. The gateway is one of four products that share the same data model:

Gateway with 250+ models, OpenAI-compatible endpoint, passthrough mode (BYO provider keys, no margin) or routed mode (one Respan key for everything).
Tracing that captures the gateway calls as spans inside the parent agent or RAG workflow, with auto-instrumentation for OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, Mastra, Pydantic AI, and ~30 other frameworks.
Evals with online (sampled production traffic, asynchronous judging) and offline (golden-set experiments) modes. LLM judges, code evaluators, human review.
Prompt management with versioning, A/B deployment, and a playground.

For the broader picture, see LLM gateway and the LLM cache layers deep-dive.

Honest weak spots for both

LiteLLM:

The Python proxy adds latency. For most workloads it is negligible (single-digit milliseconds). For high-throughput agent loops it can matter and a Go-based gateway like Bifrost beats it. See our Bifrost vs LiteLLM comparison.
Observability is per-call, not per-trace. If you are running agents, you will outgrow it.
The proxy is one more service for your platform team to babysit.

A full managed gateway (any vendor):

You are paying for a layer between you and the providers. Margins exist, even when they are invisible.
The data flows through a third party. For regulated workloads this is a real consideration (compliance certifications matter; check before signing).
You depend on the vendor's uptime, not just the providers' uptime. Pick a vendor with a track record.

A migration path that works

We see this pattern repeatedly:

Phase 1 (prototype, < $1K/mo): direct provider SDKs, no gateway. Move fast.
Phase 2 ($1K-3K/mo, 2-3 LLM features): install LiteLLM. Multi-provider, retries, basic logs.
Phase 3 ($3K-10K/mo, agent and RAG features in production): add a managed gateway (Respan or equivalent). Keep LiteLLM library calls in code if you want, or migrate to the managed gateway's SDK. Begin online evals.
Phase 4 ($10K+/mo, multi-tenant product): you are in gateway-as-strategic-infrastructure territory. Per-customer cost tracking, dynamic rate limits, and eval-driven routing are now first-order requirements.

There is no shame in being at Phase 2. There is also no glory in staying at Phase 2 once you are clearly past it.

FAQ

Is Respan a fork of LiteLLM? No. We use LiteLLM internally in some places where it is the right tool, but the Respan platform is its own data model, codebase, and integration surface.

Can I run a Respan gateway on-prem like LiteLLM? Self-hosting is available for enterprise customers. Most teams use the managed cloud.

LLM Gateway vs LiteLLM

TL;DR

What LiteLLM actually is

What a full LLM gateway adds on top

The decision: when LiteLLM is enough

The decision: when a full gateway is the right call

What Respan adds (since we are one of the gateways being compared)

Honest weak spots for both

A migration path that works

FAQ

Related articles

9 Best LLM Gateways in 2026

OpenAI vs Anthropic Pricing

Anthropic API vs AWS Bedrock Claude (2026): Which to Use

Built for AI agents.
Break less.
Ship more.

LLM Gateway vs LiteLLM

TL;DR

What LiteLLM actually is

What a full LLM gateway adds on top

The decision: when LiteLLM is enough

The decision: when a full gateway is the right call

What Respan adds (since we are one of the gateways being compared)

Honest weak spots for both

A migration path that works

FAQ

Related articles

9 Best LLM Gateways in 2026

OpenAI vs Anthropic Pricing

Anthropic API vs AWS Bedrock Claude (2026): Which to Use

Built for AI agents.
Break less.
Ship more.

Related articles

Best of
9 Best LLM Gateways in 2026
Best LLM gateways in 2026: Respan, OpenRouter, LiteLLM, Portkey, Cloudflare AI Gateway, Helicone, Bifrost, Vercel AI Gateway, TrueFoundry. Pricing, features, and when each is the right pick.
Frank Chen · May 10, 2026

Comparison
OpenAI vs Anthropic Pricing
OpenAI vs Anthropic API pricing as of May 2026. GPT-5.5/5.4 vs Opus 4.7 / Sonnet 4.6 / Haiku 4.5. Real cost math on RAG, agents, classification, plus the tokenizer trap.
Frank Chen · May 23, 2026

Comparison
Anthropic API vs AWS Bedrock Claude (2026): Which to Use
Anthropic API vs AWS Bedrock Claude compared: model freshness, pricing, IAM/VPC, BAA, latency, and a multi-cloud failover pattern through an LLM gateway.
Frank Chen · May 11, 2026

LLM Gateway vs LiteLLM

TL;DR

What LiteLLM actually is

What a full LLM gateway adds on top

The decision: when LiteLLM is enough

The decision: when a full gateway is the right call

What Respan adds (since we are one of the gateways being compared)

Honest weak spots for both

A migration path that works

FAQ

Related

Related articles

9 Best LLM Gateways in 2026

OpenAI vs Anthropic Pricing

Anthropic API vs AWS Bedrock Claude (2026): Which to Use

Built for AI agents. Break less. Ship more.

LLM Gateway vs LiteLLM

TL;DR

What LiteLLM actually is

What a full LLM gateway adds on top

The decision: when LiteLLM is enough

The decision: when a full gateway is the right call

What Respan adds (since we are one of the gateways being compared)

Honest weak spots for both

A migration path that works

FAQ

Related

Related articles

9 Best LLM Gateways in 2026

OpenAI vs Anthropic Pricing

Anthropic API vs AWS Bedrock Claude (2026): Which to Use

Built for AI agents. Break less. Ship more.

Built for AI agents.
Break less.
Ship more.

Built for AI agents.
Break less.
Ship more.