The headline token prices are the part everyone reads first. They are also the part that misleads. Once you factor in cached input, tokenizer differences, and the actual shape of your traffic, "which is cheaper, GPT or Claude" is rarely the simple comparison the marketing pages make it out to be. This piece is the engineering-side answer as of May 2026, with current pricing pulled from both providers' docs and the cost math worked out for the four workloads that account for most real spend.
Prices below are verified against Anthropic's pricing page and OpenAI's API pricing on May 23, 2026. They change. Verify before you commit a budget.
TL;DR
- Flagship workhorse tier: Sonnet 4.6 ($3 in / $15 out) and GPT-5.4 ($2.50 in / $15 out) are roughly the same money. OpenAI is ~17% cheaper on input, identical on output. Both pin at $0.30-$0.25 / MTok on cached reads.
- Top tier: Opus 4.7 ($5/$25) is genuinely cheaper than GPT-5.5 ($5/$30) and dramatically cheaper than the "pro" reasoning variants from either side ($30/$180 each).
- Cheap tier: GPT-5.4-mini ($0.75/$4.50) edges Claude Haiku 4.5 ($1/$5) by ~10% on input. GPT-5.4-nano ($0.20/$1.25) is in a separate, lower-capability tier with no Claude equivalent.
- The hidden cost nobody mentions: Opus 4.7's new tokenizer uses up to 35% more tokens for the same English text. The headline price advantage over GPT-5.5 narrows or disappears for input-heavy workloads.
- Caching is a wash on price, different on ergonomics. Both at 90% off cached reads. Claude requires explicit
cache_control; OpenAI caches automatically. Claude has 1-hour cache option, OpenAI does not. - Real-world: For a typical RAG workload with prompt caching enabled, both providers come out under $0.05 per call. The provider choice should rest on model capability and ecosystem fit, not the 5-15% pricing delta.
Current pricing tables (May 2026)
Anthropic Claude 4.x
| Model | Input $/MTok | Output $/MTok | 5-min cache write | 1-hour cache write | Cache read |
|---|---|---|---|---|---|
| Opus 4.7 | $5 | $25 | $6.25 (1.25x) | $10 (2x) | $0.50 (0.1x) |
| Sonnet 4.6 | $3 | $15 | $3.75 | $6 | $0.30 |
| Haiku 4.5 | $1 | $5 | $1.25 | $2 | $0.10 |
Batch API: 50% off all rates. 1M-token context standard on Opus 4.7, Opus 4.6, and Sonnet 4.6 at the same per-token rate. Fast mode (beta) on Opus 4.6/4.7: 6x base rates ($30 / $150) for significantly higher tokens-per-second.
OpenAI GPT-5.x
| Model | Input $/MTok | Output $/MTok | Cached input $/MTok |
|---|---|---|---|
| GPT-5.5 | $5 | $30 | $0.50 |
| GPT-5.5-pro | $30 | $180 | (no cache discount published) |
| GPT-5.4 | $2.50 | $15 | $0.25 |
| GPT-5.4-mini | $0.75 | $4.50 | $0.075 |
| GPT-5.4-nano | $0.20 | $1.25 | $0.02 |
| GPT-5.4-pro | $30 | $180 | (no cache discount published) |
| GPT-5.2 (legacy flagship) | $1.75 | $14 | (90% off cached) |
Batch API: 50% off. Regional data residency: +10% surcharge. Cached input automatic, no explicit cache marker required.
Head to head by tier
Top tier (frontier reasoning):
- Claude Opus 4.7 at $5 / $25 / $0.50 cache read.
- GPT-5.5 at $5 / $30 / $0.50 cache read.
- Both at the "pro" extreme: $30 / $180. Same ballpark, used only when you need the long-horizon reasoning premium.
For most flagship use cases Opus 4.7 is cheaper on output, but see the tokenizer note below before assuming that translates to a real-world bill advantage.
Workhorse tier (where most production traffic lives):
- Claude Sonnet 4.6 at $3 / $15 / $0.30.
- GPT-5.4 at $2.50 / $15 / $0.25.
GPT-5.4 is 17% cheaper on input and ~17% cheaper on cache reads. Identical on output. For input-heavy workloads (long system prompts, RAG with large context blocks) this matters. For output-heavy workloads (code generation, long-form writing) the providers are functionally equivalent on price.
Cheap tier (classification, routing, short-form):
- Claude Haiku 4.5 at $1 / $5 / $0.10.
- GPT-5.4-mini at $0.75 / $4.50 / $0.075.
- GPT-5.4-nano at $0.20 / $1.25 / $0.02.
GPT-5.4-mini undercuts Haiku 4.5 by ~10% across the board. GPT-5.4-nano is in a separate capability tier; Anthropic does not currently ship a model in that price-capability bucket. For high-volume routing or extraction where capability headroom is large, nano is genuinely cheap.
The tokenizer trap
The Anthropic docs include a note most engineers miss on first read: Opus 4.7's tokenizer can use up to 35% more tokens for the same fixed English text compared to earlier models. That changes the headline price math.
Consider a 50K-token document fed to both Opus 4.7 and GPT-5.5 for a one-shot question:
- Headline price suggests Opus 4.7 input is $0.25 vs GPT-5.5 input at $0.25. Tie on input.
- Real-world: if Opus 4.7's tokenizer counts that same document as ~67K tokens, the input cost becomes $0.335 vs GPT-5.5's $0.25. Opus 4.7 is now ~34% more expensive on input despite the same posted rate.
The output side is less affected (output token counts are determined by the model's response, not the input encoding). But on input-heavy workloads (RAG, document analysis, code review) the tokenizer overhead can erase or reverse the apparent Opus 4.7 price advantage over GPT-5.5.
The fix is to measure on your actual data before committing to a model based on the posted rate. The tokenizer overhead varies by content (less on code, more on long-form English).
Caching: same destination, different roads
Both providers offer roughly 90% off cached reads. The difference is ergonomics.
Anthropic prompt caching:
- Explicit
cache_controlmarkers on up to 4 content blocks per request. - Two TTLs: 5 minutes (1.25x write multiplier) and 1 hour (2x write multiplier).
- You decide what gets cached. Precise, more setup.
- Cache reads do not count against ITPM rate limits.
OpenAI cached input:
- Automatic. No
cache_control, no markers. The API decides what to cache based on observed prefix repetition. - Single implicit TTL (not officially documented but typically minutes).
- Less control, less setup, less precision.
For workloads with a long stable prefix (RAG context, system prompts, conversation history) both providers deliver the same approximate cost reduction. For workloads where the cacheable region is non-obvious or shifts often, Anthropic's explicit control wins. For workloads where simplicity matters more than precision, OpenAI's automatic caching wins.
For the full mechanics including the cache layers above the provider cache (exact-match, semantic), see LLM cache layers. For the deep dive on Claude's prompt caching specifically, see Claude prompt caching pricing.
Cost math on four real workloads
Workload 1: Single-turn chat (1K input, 500 output)
- Sonnet 4.6: $0.003 in + $0.0075 out = $0.0105
- GPT-5.4: $0.0025 in + $0.0075 out = $0.0100
- Haiku 4.5: $0.001 in + $0.0025 out = $0.0035
- GPT-5.4-mini: $0.00075 in + $0.00225 out = $0.003
GPT-5.4-mini and Haiku 4.5 are within 17% of each other. GPT-5.4 and Sonnet 4.6 within 5%. At this scale the pricing delta is rounding error compared to your salary cost evaluating which one.
Workload 2: RAG with 50K retrieved context, 500-token answer, cache enabled (90% cache hit on prefix)
- Sonnet 4.6: 45K × $0.30/MTok + 5K × $3/MTok + 500 × $15/MTok = $0.036
- GPT-5.4: 45K × $0.25/MTok + 5K × $2.50/MTok + 500 × $15/MTok = $0.031
GPT-5.4 wins by ~14%. Same order of magnitude. For most production budgets, this delta does not justify a provider switch on its own.
Workload 3: 20-step agent loop, 4K avg input per step, 500 avg output per step, no caching
- Sonnet 4.6: 80K × $3/MTok + 10K × $15/MTok = $0.24 + $0.15 = $0.39 per session
- GPT-5.4: 80K × $2.50/MTok + 10K × $15/MTok = $0.20 + $0.15 = $0.35 per session
10% delta. At 100K agent sessions per month, that's $4K vs $35K total, depending on session count. The bigger lever in both cases is enabling provider prompt caching, which would cut input cost by 70-80%, dwarfing the provider-pricing difference.
Workload 4: High-volume classification (100K calls/day, 500 input, 50 output)
- Haiku 4.5: 500 × $1/MTok + 50 × $5/MTok = $0.00075 per call × 100K/day × 30 days = $2,250/month
- GPT-5.4-mini: 500 × $0.75/MTok + 50 × $4.50/MTok = $0.000600 per call × 3M = $1,800/month
- GPT-5.4-nano: 500 × $0.20/MTok + 50 × $1.25/MTok = $0.000163 per call × 3M = $488/month
For classification, GPT-5.4-nano is dramatically cheaper if its capability is sufficient for the task. This is the one workload where the provider choice can move the bill 4x.
A decision tree based on pricing alone
If pricing is your dominant constraint, the answers cluster cleanly.
- High-volume classification / extraction / routing: GPT-5.4-nano is in a separate price tier. Test it. If it works, the bill savings are large.
- Input-heavy workloads (long RAG context, document analysis): GPT-5.4 is ~15% cheaper than Sonnet 4.6 and avoids the Opus 4.7 tokenizer overhead. Use cached input aggressively on either provider.
- Output-heavy workloads (code generation, long-form writing): Pricing is roughly identical. Pick by capability and ecosystem fit, not pricing.
- Frontier reasoning: Opus 4.7 has a posted-price advantage over GPT-5.5 on output ($25 vs $30) but check the tokenizer impact on your real data before committing.
- Batch workloads (overnight evals, bulk processing): 50% off on both. Same comparison logic applies.
In practice, very few teams should pick provider based on pricing alone. The capability differences and ecosystem fit (tool calling, multimodal, MCP support, framework integrations) usually matter more. But if pricing IS the question, the answers above are the ones the math gives.
Where Respan fits
Most teams running production LLM apps use both providers. The right way to do that without doubling your infrastructure cost is to put a gateway in front and route per-call based on workload type, cost ceiling, or feature.
Respan ships a unified LLM gateway across 250+ models including the full Claude 4.x family, GPT-5 family, plus open models and self-hosted endpoints. One API key, one billing relationship, one trace tree showing exactly which model handled which call and what it cost. See LLM gateway for the architecture or LLM gateway vs LiteLLM for the gateway-tier comparison. For per-customer cost breakdowns and budgets, the users feature wires customer identifiers through every trace.
FAQ
Is GPT-5.4 or Sonnet 4.6 the "better" workhorse model? On pricing, GPT-5.4 is ~17% cheaper on input. On capability the two are close enough that the right answer depends on your eval set. Run a head-to-head on 100 of your hardest production prompts before committing.
Why is Opus 4.7 cheaper than the old Opus 4.1? Anthropic dropped Opus pricing significantly when they shipped Opus 4.5 in late 2025 ($5/$25 vs the old $15/$75). Opus 4.6 and 4.7 inherit the new pricing. The legacy Opus 4 and 4.1 are deprecated; do not use them for new work.
Should I use the 1-hour cache TTL on Claude? Only when the same prefix is hit more than 5-7 times over an hour. The 2x write multiplier needs that many reads to break even versus the 5-minute (1.25x) cache. See the math in Claude prompt caching pricing.
Is OpenAI's automatic caching as good as Anthropic's explicit caching? For workloads with an obvious long-stable prefix, yes. The savings end up similar. For workloads where you would want to cache something in the middle of a request, Anthropic's explicit cache_control gives you control OpenAI does not.
What about GPT-5.5-pro vs Opus 4.7 for the hardest tasks? GPT-5.5-pro ($30/$180) is much more expensive than Opus 4.7 ($5/$25). Use the pro tier only for the specific subset of tasks where it beats Opus 4.7 by enough to justify the 6x output cost. Most teams find Opus 4.7 is sufficient for their hard cases.
How do the batch APIs compare? Both providers offer 50% off batch processing. The comparison logic is identical to the real-time pricing. If you can tolerate non-realtime latency (eval runs, document processing pipelines), use batch for either provider.
Does the regional data residency premium matter? For most teams in the US, no. If you have compliance requirements for EU or other regions, both providers charge ~10% extra. Compare the delta against the cost of bring-your-own infrastructure on Azure OpenAI or AWS Bedrock for the comparison case.
What's the single change that saves the most money? Enable prompt caching on either provider. A 70-80% cut on input tokens is bigger than any provider-switching delta you can realistically achieve.