You can call Claude through Anthropic's own API, or through AWS Bedrock's Claude models. Same weights, same outputs, different front door. The choice usually comes down to four things: how quickly you want new models, who controls billing, what your security team needs, and where your latency tail sits.
This guide walks through the practical differences as of May 2026, when each side wins, the pricing parity question, and the multi-cloud failover pattern that most production teams end up running through a gateway anyway.
For the broader question of why a proxy layer ends up in front of all of this, see our LLM gateway pillar.
TL;DR
- Same model weights. Sonnet 4.6 on Anthropic and Sonnet 4.6 on Bedrock produce the same outputs given the same input.
- Anthropic direct wins on: day-one access to new models, simpler billing, slightly lower latency US East, latest features (extended thinking, computer use, fast mode) ship first.
- Bedrock wins on: AWS-native IAM, VPC PrivateLink, regional data residency, BAA included with AWS, single AWS invoice, AWS committed-use discounts.
- Pricing is at parity at list. Bedrock occasionally adds a small markup, and AWS's enterprise discount programs (EDPs) can swing the comparison.
- The right answer for most production teams is both, with a gateway routing primary to Anthropic direct and failing over to Bedrock when Anthropic returns 529s.
The basic difference
Anthropic ships Claude as a first-party API at api.anthropic.com. AWS ships the same models inside Bedrock as managed Foundation Models, accessible via the AWS SDK or Bedrock Runtime API. The model weights and the inference behavior are identical. What differs is everything around the call: authentication, billing, networking, model availability dates, and the rate-limit story.
There is also Google Cloud Vertex AI which hosts Claude similarly. The Bedrock comparison below mostly applies to Vertex as well, with the obvious swap of AWS for GCP primitives.
Model freshness
Anthropic launches new Claude models on api.anthropic.com first. Bedrock catches up usually within days to a few weeks, sometimes longer for region rollouts. As of May 2026:
- Opus 4.7: launched on Anthropic direct, Bedrock availability varies by region.
- Sonnet 4.6, Haiku 4.5: broadly available across both.
- Older snapshots remain available on both for stability-conscious workloads.
If you want to ship new-model improvements the day they release, Anthropic direct is the obvious answer. If you want to wait until a model has been stable in production for a few weeks before adopting it, Bedrock's slight lag is a feature, not a bug.
Pricing
At list price, Bedrock and Anthropic direct charge essentially the same per-token rates. As of May 2026:
| Model | Anthropic Input | Anthropic Output | Bedrock Input | Bedrock Output |
|---|---|---|---|---|
| Opus 4.7 | $5/MTok | $25/MTok | $5/MTok | $25/MTok |
| Sonnet 4.6 | $3/MTok | $15/MTok | $3/MTok | $15/MTok |
| Haiku 4.5 | $1/MTok | $5/MTok | $1/MTok | $5/MTok |
Always verify on each provider's pricing page before committing. The non-token differences that move the bill:
- AWS committed-use discounts. If you have a Bedrock commitment as part of an AWS EDP, Bedrock can be cheaper net of the discount.
- Cross-region transfer charges. Calling Bedrock in
us-east-1from aus-west-2Lambda adds data transfer fees. - Anthropic prompt caching pricing applies on both. The 5-min and 1-hour cache work on Bedrock too, with the same approximate read multiplier.
- Batch API discount (50% off) is available on both for asynchronous workloads.
For most workloads at moderate scale, list-price parity holds and the decision rests on the non-pricing factors below.
Security and compliance
This is where Bedrock has its clearest advantages.
- IAM. Bedrock calls authenticate via AWS IAM roles. You attach a permissions policy, no separate API key to rotate. Anthropic direct uses bearer tokens that have to be managed in your own secrets system.
- VPC PrivateLink. Bedrock can be accessed without traversing the public internet. Anthropic direct goes over the public internet (TLS, but still).
- Regional data residency. Bedrock surfaces clear region selection (
us-east-1,eu-central-1, etc.). Anthropic direct is moving toward explicitinference_geotags but the surface is smaller. - BAA. AWS includes Bedrock under its AWS BAA for HIPAA. Anthropic offers a BAA for Claude on its enterprise plan, but the path through AWS is shorter if your team is already AWS-native.
- Audit logging. Bedrock writes to CloudTrail by default. Anthropic provides usage logs but you typically need a gateway or your own logging layer for fine-grained per-call audit.
If your security team gates LLM usage on "must be in our VPC" or "must be HIPAA-covered through our AWS BAA," Bedrock is the path of least resistance.
Latency
Anthropic direct is generally a touch faster than Bedrock for US East workloads on first-token latency, mostly because Bedrock adds a thin layer of AWS infrastructure between you and Anthropic's inference. The difference is usually 30-80ms on first token, less on output throughput once streaming starts.
A few notes:
- The difference is small relative to the model's own inference time. For an Opus call generating 1000 output tokens, a 50ms first-token delta is a rounding error.
- Bedrock latency varies more by region.
us-east-1is closest to parity.eu-central-1and APAC regions sit further from Anthropic's primary inference footprint. - Both providers offer regional inference; pick the region closest to your callers.
Rate limits
Two completely different stories.
Anthropic direct uses Build Tiers 1-4 with per-model RPM, ITPM, and OTPM limits, with automatic tier advancement based on credit purchased. Custom limits above Tier 4 require sales. See our Anthropic API rate limits guide.
Bedrock uses AWS service quotas, set per-region and per-account. Quotas are requested through the AWS Service Quotas console. There is no automatic tier advancement; you file a quota increase ticket. The AWS side is more bureaucratic but the ceilings can go higher for enterprise accounts.
The practical implication: Anthropic direct is easier to scale up to mid-size workloads quickly. Bedrock requires planning quota increases ahead of launches, but can host very large enterprise workloads at high limits.
Code: calling each side
Anthropic direct (Python)
import os
import anthropic
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)Bedrock (Python)
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}],
}
resp = bedrock.invoke_model(
modelId="anthropic.claude-sonnet-4-6-v1:0",
body=json.dumps(body),
)
result = json.loads(resp["body"].read())Same response shape (Anthropic's Messages format wrapped in Bedrock's envelope). Authentication is the big surface difference: Anthropic uses an API key in the header, Bedrock uses AWS SigV4 from your IAM credentials.
Anthropic direct (TypeScript)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const resp = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }],
});Bedrock (TypeScript)
import {
BedrockRuntimeClient,
InvokeModelCommand,
} from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
const command = new InvokeModelCommand({
modelId: "anthropic.claude-sonnet-4-6-v1:0",
body: JSON.stringify({
anthropic_version: "bedrock-2023-05-31",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }],
}),
});
const resp = await client.send(command);
const result = JSON.parse(new TextDecoder().decode(resp.body));Same essential shape, different SDK and auth model.
The multi-cloud failover pattern
Most teams that hit production scale do not pick one or the other. They run both, behind a gateway, with Anthropic direct as primary and Bedrock as failover (or vice versa, depending on the team's AWS posture).
Why both:
- Resilience against 529s. When Anthropic is overloaded, Bedrock often is not. Same model, different capacity.
- Region diversity. EU-based callers can route to
eu-central-1Bedrock; US callers to Anthropic direct. - Cost optimization. AWS committed-use discounts apply to Bedrock traffic only. Route the steady baseline to Bedrock under commitment, route the spiky overflow to Anthropic direct on metered pricing.
Through a gateway, all of this is a config change:
from openai import OpenAI
client = OpenAI(
base_url="https://api.respan.ai/v1",
api_key=os.environ["RESPAN_API_KEY"],
)
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6", # gateway routes per config
messages=[{"role": "user", "content": "Hello"}],
)The gateway tries Anthropic first, falls over to Bedrock on 529 or timeout, and emits one trace per call regardless of which provider answered. See Best LLM Gateways in 2026 for a comparison of options.
When each one wins outright
Pick Anthropic direct if:
- You ship new models the day they release.
- You want the simplest possible billing (one Anthropic invoice).
- You want first access to features like Fast mode, computer use, and extended thinking.
- Your stack is multi-cloud or non-AWS.
Pick Bedrock if:
- You are AWS-native and want one bill, one IAM model, one audit trail.
- HIPAA or sector compliance is satisfied through your AWS BAA.
- You need VPC PrivateLink or strict regional residency.
- You have AWS EDP commits that apply.
Run both behind a gateway if:
- You are past month 6 of production.
- You ever saw a 529 outage that blocked users.
- You have callers in multiple regions.
- You want to evaluate models on a small slice of traffic before broad rollout.
FAQ
Are Claude responses identical between Anthropic direct and Bedrock? For the same model version and deterministic settings, yes. Same weights, same outputs.
Is Bedrock more expensive? At list, prices match. Bedrock can be cheaper after AWS commitment discounts. Cross-region transfer and slightly different cache pricing can shift the bill modestly either way.
When do new Claude models hit Bedrock? Usually within days to weeks of the Anthropic direct launch. Region availability rolls out progressively.
Does prompt caching work on Bedrock? Yes. Bedrock supports Anthropic's prompt caching with the same approximate cost ratios. See Claude prompt caching.
Is Bedrock HIPAA-covered? Bedrock is in scope for AWS's BAA program. Anthropic offers a BAA directly on enterprise plans. Both paths are viable; Bedrock is the simpler one if you already have an AWS BAA.
Can I run an active-active setup across both? Yes, through a gateway. Most teams split traffic 70/30 or 90/10 between primary and secondary, then flip on 529s or quota exhaustion.
Which has better rate limits? Anthropic direct's Build Tier 4 maxes at 10M ITPM on Opus. Bedrock can go higher with quota tickets but requires planning. For mid-scale, Anthropic is easier; for very large enterprise, Bedrock has more headroom.