Two ways to coerce JSON out of the OpenAI API exist today: the new response_format: { type: "json_schema", strict: true } (Structured Outputs) and the older response_format: { type: "json_object" } (JSON mode). They sound similar, they are not. One guarantees your output matches a schema. The other guarantees only that the output parses as JSON.
If you are building agents, data extraction pipelines, tool-use systems, or anything where downstream code expects fields by name, you want Structured Outputs. JSON mode is the legacy fallback, useful in a narrow set of cases. This guide covers the difference, the gotchas, and concrete code in Python and TypeScript.
TL;DR
- Structured Outputs (
json_schema, strict mode): model output is guaranteed to match your JSON Schema. 100 percent schema match on supported models. Use this by default. - JSON mode (
json_object): model output is guaranteed to be valid JSON, with no schema enforcement. Legacy. Use only for older models or when you cannot write a schema. - Structured Outputs supports refusals as a first-class field. If the model declines for safety reasons, you get
response.choices[0].message.refusalinstead of malformed content. - Pydantic (Python) and Zod (TypeScript) integrate directly. Both SDKs auto-derive the JSON Schema and parse the response back into a typed object.
- Supported on GPT-5.5, GPT-5.4, GPT-5.4-mini, GPT-5.4-nano, GPT-5.2-Codex. Not all old models support strict mode; check before you migrate.
The two formats, side by side
| Feature | JSON mode (json_object) | Structured Outputs (json_schema, strict) |
|---|---|---|
| Output is valid JSON | Yes | Yes |
| Matches a specific schema | No | Yes (100 percent on supported models) |
| Schema validation by OpenAI | None | Constrained decoding |
| Pydantic/Zod auto-parsing | Manual | Built into SDK |
| Refusal field | No | Yes |
| First request latency | Standard | Slightly higher on first call per schema (compilation) |
| Subsequent requests | Standard | Standard (schema is cached) |
| Recommended in 2026 | No (legacy) | Yes (default) |
The key mental model: JSON mode is "the model tries to output JSON, please verify yourself." Structured Outputs is "OpenAI's decoder physically cannot emit tokens that violate your schema." That guarantee is enforced at the sampling layer, not by a post-hoc check.
Structured Outputs: the modern default
Here is a minimal example in Python. We define a Pydantic model and pass it directly to the SDK's .parse() helper, which derives the JSON Schema, sets strict mode, and parses the response into the Pydantic instance.
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class Ticket(BaseModel):
title: str
severity: str # "low" | "medium" | "high" | "critical"
affected_service: str
summary: str
response = client.chat.completions.parse(
model="gpt-5.4",
messages=[
{"role": "system", "content": "Extract a ticket from the user's bug report."},
{"role": "user", "content": "The checkout page returns a 500 on iOS. Started 2 hours ago."},
],
response_format=Ticket,
)
ticket: Ticket = response.choices[0].message.parsed
print(ticket.severity, ticket.title)The response is a fully typed Ticket instance. No json.loads, no try/except, no validation step. If the model refuses, message.parsed is None and message.refusal contains the reason.
In TypeScript with Zod:
import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";
const client = new OpenAI();
const Ticket = z.object({
title: z.string(),
severity: z.enum(["low", "medium", "high", "critical"]),
affected_service: z.string(),
summary: z.string(),
});
const response = await client.chat.completions.parse({
model: "gpt-5.4",
messages: [
{ role: "system", content: "Extract a ticket from the user's bug report." },
{ role: "user", content: "Checkout page returns 500 on iOS. Started 2 hours ago." },
],
response_format: zodResponseFormat(Ticket, "ticket"),
});
const ticket = response.choices[0].message.parsed;
console.log(ticket?.severity, ticket?.title);Same guarantees: typed output, refusal handling, no manual parsing.
What "strict" actually enforces
When you set strict: true in a json_schema response format, the constraints OpenAI enforces are:
- Required fields are present. Every property in your schema's
requiredarray must appear in the output. - No additional properties. The schema implicitly sets
additionalProperties: false. The model cannot invent extra keys. - Types match. A
numberfield will be a number, not the string "42". - Enums match. A field with
enum: ["low", "medium", "high"]will be one of those three values, never anything else. - Nesting is exact. Nested objects and arrays follow the same rules at every level.
Caveats to know:
- All fields must be in
required. Strict mode does not support optional fields directly. To make a field optional in the practical sense, mark it required and union the type withnull(type: ["string", "null"]). - Top-level type must be
object. You cannot return a raw array; wrap it ({ "items": [...] }). - Maximum depth of around 5 nesting levels and a few thousand object properties total. For most schemas this is not a constraint, but very large schemas may need to be split.
- Schema compilation happens on first use. That first request is slightly slower (a few hundred ms). Subsequent requests with the same schema reuse the compiled grammar.
JSON mode: the legacy fallback
JSON mode predates Structured Outputs. It enforces only "the output parses as JSON." The model still chooses field names and types based on your prompt; nothing stops it from inventing a key, omitting a required field, or returning a number as a string.
response = client.chat.completions.create(
model="gpt-5.4",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "Return a JSON object with keys: title, severity, summary."},
{"role": "user", "content": "Checkout page returns 500 on iOS."},
],
)
import json
data = json.loads(response.choices[0].message.content)
# Now hope `data` has the keys you asked for.Why this still exists:
- Older model versions. Some pinned model snapshots predate Structured Outputs and only support JSON mode.
- Schema-free use cases. "Return a JSON object summarizing this article" with no fixed shape.
- Migration in progress. Code that has not been updated yet.
For anything new in 2026, prefer Structured Outputs. JSON mode is the curl of structured generation: it does what it says, no more, and you own the validation.
Refusals: the field you cannot ignore
Structured Outputs introduces a refusal field on the assistant message. If the model declines a request for safety reasons, message.parsed is None and message.refusal holds a string explanation. Your code must check both.
msg = response.choices[0].message
if msg.refusal:
log.warning("Model refused", reason=msg.refusal, request_id=response.id)
return fallback_response()
ticket: Ticket = msg.parsedIf you skip the refusal check and just access .parsed, you get None and a confusing AttributeError two function calls later. The refusal path is rare for benign data-extraction prompts but real for anything close to a content-policy boundary.
Performance and cost
Structured Outputs and JSON mode bill the same way: standard per-token rates, no surcharge for either. The cost story is identical.
Latency:
- First call for a new schema: Structured Outputs adds a one-time compilation step. Typically 100 to 500 ms of overhead.
- Subsequent calls (same schema): identical to a normal call. The grammar is cached.
- JSON mode: no compilation step, but you pay the overhead in your code (validation, retries on bad output).
If you make 1,000 requests against the same schema, the per-request overhead from Structured Outputs is negligible. If you generate a brand-new schema for every single request, the compilation overhead adds up. In practice, schemas are stable per feature, so this is rarely a problem.
When to use which
Use Structured Outputs (json_schema, strict) for:
- Tool calling. (Function-calling tool definitions use the same strict-schema mechanism under the hood.)
- Data extraction (entity extraction, form parsing, document understanding).
- Agent state transitions where you need a typed action object.
- Eval pipelines where the judge model returns a typed verdict. See What is Prompt Evaluation.
- Anything that flows into a downstream typed system (DB insert, GraphQL mutation, queue message).
Use JSON mode (json_object) for:
- One-off summarization where you do not want to write a schema.
- Free-form research-style outputs where the shape genuinely varies.
- Working around a model snapshot that does not support strict mode (rare in 2026).
If you are unsure, default to Structured Outputs. It is strictly more capable than JSON mode for any case where you know what fields you want.
Model support
As of May 2026, strict Structured Outputs is supported on:
- GPT-5.5 (flagship)
- GPT-5.4
- GPT-5.4-mini
- GPT-5.4-nano
- GPT-5.2-Codex (with limitations on tool-use schemas)
Older snapshots of GPT-4-class models still get JSON mode but not strict Structured Outputs. Verify on the model's docs page before pinning.
Validating outputs anyway
A common question: "If Structured Outputs guarantees the schema, do I still need Pydantic or Zod validation?" Yes, for two reasons:
- Defense in depth. Bugs happen on the OpenAI side too. A small extra check costs almost nothing.
- Refinement constraints. JSON Schema can express types and enums but not semantic rules ("end_date must be after start_date", "amount must be positive when type is 'credit'"). Encode those in Pydantic validators or Zod refinements and run them after parsing.
from pydantic import BaseModel, model_validator
class Booking(BaseModel):
start: str
end: str
@model_validator(mode="after")
def end_after_start(self):
if self.end <= self.start:
raise ValueError("end must be after start")
return selfIf validation fails, treat it as a model error: log, optionally retry once, and surface the failure as a metric. This becomes one of the standard signals in an LLM observability stack.
Migrating from JSON mode to Structured Outputs
The migration is small and almost always worth it:
- Define your output shape as a Pydantic or Zod model.
- Replace
response_format={"type": "json_object"}with the SDK's typed parse helper. - Delete your manual
json.loadsand field-validation code. - Add a refusal check.
- Run your eval suite against both implementations. Confirm parity or improvement.
In our own migration of the eval-judging pipeline, we cut "malformed output" retries from roughly 2 percent of calls to zero, and removed about 60 lines of defensive parsing code per call site. Total migration time was a few hours.
FAQ
Does Structured Outputs work with streaming?
Yes. The SDK exposes stream=True for parse helpers and emits partial parsed objects as tokens arrive.
Does it work with tool calling?
Yes. Tool definitions already use strict JSON Schema under the hood. Setting strict: true on a tool definition gives the same guarantees as on a top-level response.
Can I use a recursive schema?
Yes, with restrictions. You can reference siblings via $ref. Watch the depth limit.
Will it hurt quality versus free-form output? Empirically, no for well-scoped extraction tasks. Some open prompts where the model needs to "think" before structuring may benefit from a two-step approach: free-form reasoning first, then a separate strict-schema call to format it.
Does it support arrays at the top level?
No. Wrap them in an object: { "items": [...] }. This is a hard constraint of the API.
What happens if my schema is invalid? The API returns a 400 with a schema error on the first request. The schema is validated at submission time before any tokens are generated.
Is there a token overhead for strict mode? The schema is sent as part of the request, so very large schemas add input tokens. For most schemas (a few hundred tokens) the overhead is negligible.