OpenAI API credits are the prepaid balance that funds every call you make to the OpenAI API. Unlike the consumer ChatGPT subscription, the API is pay-as-you-go: you load credits into your developer account, every request burns a few cents off the balance, and when it hits zero your calls start returning 429 errors with an insufficient_quota code.
If you are running production traffic on GPT-5.5 or GPT-5.4, the difference between a naive setup and one that uses prompt caching plus the batch API is roughly 4x on your monthly bill. That is the difference between $4,000 and $1,000 for the same workload. This guide walks through how credits actually work, how to get them, and the two levers that matter most for cost.
For the broader cost-control story (multi-provider fallback, per-feature budgets, usage attribution), see our LLM Gateway pillar.
TL;DR
- OpenAI billing is prepaid. You load credits, the API spends them, you set auto-recharge if you want to avoid running dry.
- New accounts created after mid-2025 receive zero automatic free credits. The old $5 trial is gone for most regions; promotional credits exist but are not automatic.
- Two discounts stack and matter: prompt caching (up to 90 percent off cached input tokens) and the batch API (50 percent off across the board with a 24-hour SLA).
- Combined, cached batch input on GPT-5.4 costs about $0.625 per million tokens versus a list price of $2.50. That is roughly 75 percent off.
- The fastest way to drain credits is a runaway agent in a retry loop. Set a hard monthly budget cap in the platform settings on day one.
How OpenAI API billing works in 2026
OpenAI runs prepaid billing. There is no monthly invoice and no overdraft. You buy credits up front, your usage burns them down, and when the balance hits zero the API stops accepting requests.
You have two top-up modes:
- Manual prepaid. You click "Add to credit balance" and load a fixed amount ($10 minimum, $1,000 maximum per top-up depending on your tier). Credits expire 12 months after purchase if unused.
- Auto-recharge. You set a threshold ($X remaining) and a top-up amount ($Y added). Whenever your balance dips below the threshold, OpenAI charges your card and adds the credit. This is the only safe option for production traffic.
The credit balance is shared across every model, every endpoint, every team member in the organization. There is no per-project sub-balance from OpenAI directly; if you need that, you handle it in a gateway or in your own accounting layer.
Note: a "credit" is just a dollar (or your local currency equivalent if you are billed outside USD). There is no virtual unit. $100 in credits equals $100 of API usage at whatever per-token rates apply to the models you call.
Free credits for new accounts
This is the section that gets out of date fastest, so be aware: as of May 2026, OpenAI does not automatically grant free credits to new developer accounts. The historical $5 trial credit was phased out for most new signups in mid-2025.
What still exists:
- Researcher Access Program. Academics can apply for credit grants. Approvals take a few weeks and are not guaranteed.
- Promotional partnerships. Startup programs (YC, Microsoft for Startups, accelerators) occasionally bundle OpenAI credits. Check your accelerator's perks dashboard.
- Regional promotions. OpenAI runs occasional new-account credits in specific markets. They are not announced in advance.
Bottom line: budget for paying from day one. Do not architect your project around the assumption of free credits. Always check the live pricing page at the time of signup for the current free-credit status, because this changes.
Applying promo and partner credits
If you do receive credit codes (from a hackathon, a startup program, a research grant), you apply them in the OpenAI platform billing UI under "Credit grants" or "Apply code". A few rules:
- Promo credits are spent before prepaid credits. The platform draws them down first automatically.
- They have an expiration date attached to the grant. You see it in the billing UI; it is usually 3 to 12 months.
- They are organization-scoped. Apply them on the org that will actually do the spending.
- They are not transferable between organizations. Pick the right org first.
If your grant has an expiration in two weeks and you have not used it, that is a "use it or lose it" moment. Either run your eval suites against the latest models or burn it on a one-time bulk classification job.
Monitoring usage so you don't get surprised
There are three layers of monitoring, and you should run all three:
- OpenAI Usage dashboard. Built-in, updates with about an hour lag, shows tokens and cost per model. Good for sanity checks, bad for real-time alerting.
- Webhook + email alerts. Set a "soft" usage limit (a dollar threshold that triggers an email) and a "hard" monthly budget cap (which stops API calls). These live in the limits settings. Set both.
- Per-call attribution. The Usage dashboard tells you "you spent $4,000 on GPT-5.5 this month." It does not tell you which feature, user, or prompt drove that. You need either a gateway or your own logging layer.
For attribution, a gateway is the natural choke point. See What is an LLM Gateway and Best LLM Gateways in 2026 for the framing. The short version: every request flows through the gateway, the gateway tags it with user, feature, environment, and writes the cost row to your logs. You can answer "which user is the top spender" in a SQL query instead of a multi-week guessing game.
Stretching credits: prompt caching
Prompt caching is the single biggest lever in 2026 for cutting OpenAI bills. The mechanic: when you send a request, OpenAI hashes the prefix of your input. If the same prefix was seen recently (within roughly 5 to 10 minutes of idle, longer with steady traffic), OpenAI serves those tokens from cache at a steep discount instead of recomputing.
The discount is large. On GPT-5.4 standard pricing, input tokens drop from $2.50 per million to $0.25 per million when cached, which is 90 percent off. On GPT-5.5, the absolute discount per token is bigger because the list price is higher.
Caching is automatic. You do not call a "cache" endpoint. To benefit, you structure your prompts so the long, stable part comes first and the variable user input comes last:
from openai import OpenAI
client = OpenAI()
# Long stable prefix at the top: system prompt, schemas, examples, RAG context.
SYSTEM = open("system_prompt.md").read() # 8k tokens of stable instructions
SCHEMA = open("output_schema.json").read() # 2k tokens of schema
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": SYSTEM + "\n\n" + SCHEMA},
# Variable user input at the bottom.
{"role": "user", "content": user_query},
],
)
# Inspect cache hits in the usage object.
print(response.usage.prompt_tokens_details.cached_tokens)Two things to watch:
- Order matters. Put stable content first, variable content last. If you concatenate user input into the middle of the system prompt, you bust the cache on every call.
- Minimum prefix length. Caching kicks in above a minimum prefix size (currently 1024 tokens at the time of writing, verify on the live docs). Below that, no discount.
Stretching credits: the batch API
The batch API is the second lever, and it stacks with caching. The deal: you submit a JSONL file of requests, OpenAI processes them within 24 hours, and you pay 50 percent of the standard rate for every model.
When to use it:
- Eval runs over a dataset of thousands of examples.
- Backfill jobs (classify all historical support tickets, re-embed a documentation corpus, etc.).
- Daily report generation that does not need to be live.
- Synthetic data generation for fine-tuning.
When not to use it:
- Anything user-facing. The 24-hour window means you cannot stream a response back to a chat UI.
- Anything with a tight deadline. Batches often complete faster than 24 hours, but there is no SLA shorter than that.
Submission looks like this:
from openai import OpenAI
client = OpenAI()
# 1. Upload your JSONL file of requests.
batch_file = client.files.create(
file=open("requests.jsonl", "rb"),
purpose="batch",
)
# 2. Submit the batch.
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
)
# 3. Poll for completion, then download results.
print(batch.id, batch.status)Combine batch + caching on a stable-prefix workload and you can hit roughly 25 percent of list price. That is a 4x cost reduction for the exact same model output.
Other ways to stretch credits
A few smaller levers, in order of impact:
- Pick the right model. GPT-5.4-mini is materially cheaper than GPT-5.4 and handles a large fraction of production traffic with no measurable quality loss. GPT-5.4-nano is cheaper still for classification and simple extraction. Verify cost per task on your own evals, not vibes. See How to Evaluate an LLM.
- Trim system prompts. A 5,000-token system prompt that could be 1,500 tokens is paying 3.3x more per call, every call. Audit yours quarterly.
- Use a gateway with semantic cache for FAQ-style queries. Exact-match cache hits return in single-digit milliseconds at zero cost. For repeated user queries (support, docs, common tools), this can absorb 20 to 40 percent of traffic before it hits OpenAI.
- Provider fallback for free-on-failure. If OpenAI rate-limits you, falling back to Azure OpenAI or another provider keeps the request flowing without retry costs. See our LLM Gateway pillar.
Common gotchas
- Runaway agents. An agent loop that fails to terminate (bad stop condition, retry on the wrong error) can burn $1,000 in an afternoon. Always set a per-run token budget and a hard monthly cap.
- Forgetting credits expire. Unused prepaid credits expire 12 months after purchase. Do not over-stock.
- Credits applied to the wrong organization. Promo codes are not transferable. Check the org switcher before redeeming.
- No auto-recharge in production. If you forget to top up, your API returns 429 with
insufficient_quotaand your product breaks. Auto-recharge is not optional for production. - Treating Usage dashboard as ground truth. It lags by an hour or so. For real-time, log your own usage rows per request via a gateway or middleware.
- Mixing dev and prod on one org. A noisy eval run from dev can blow through your prod budget. Use separate orgs (each with its own balance) for dev, staging, and prod.
FAQ
Do OpenAI API credits expire? Yes. Prepaid credits expire 12 months after purchase. Promo and grant credits have their own expiration shown in the billing UI, usually 3 to 12 months.
Can I get a refund on unused credits? Generally no for promo credits. For prepaid credits, OpenAI has historically only refunded under specific circumstances (account closure, dispute). Treat credits as non-refundable when you load them.
What happens when my balance hits zero?
The API returns 429 with an insufficient_quota error code. New requests fail immediately. Existing streaming responses already in flight complete. Top up to resume.
Is ChatGPT Plus the same as API credits? No. ChatGPT Plus ($20/month) is a consumer subscription for the chat.openai.com web product. It does not grant any API credits. The API is billed separately and prepaid.
Can I use a single set of credits across organizations? No. Each OpenAI organization has its own credit balance and billing. There is no transfer.
How do I cap spend at a hard dollar amount? Set a "Monthly budget" in your organization's limits page. When usage hits it, the API returns 429 until next month. Combine with a "Usage alert" at 50 or 80 percent of budget for early warning.
Why is my bill higher than my Usage dashboard shows? The Usage dashboard lags by roughly an hour. Reconcile with the Invoices page for billed totals. Also check: do you have other endpoints (embeddings, audio, image) you are forgetting?