Two ways to coerce JSON out of the OpenAI API exist today: the new response_format: { type: "json_schema", strict: true } (Structured Outputs) and the older response_format: { type: "json_object" } (JSON mode). They sound similar, they are not. One guarantees your output matches a schema. The other guarantees only that the output parses as JSON.

If you are building agents, data extraction pipelines, tool-use systems, or anything where downstream code expects fields by name, you want Structured Outputs. JSON mode is the legacy fallback, useful in a narrow set of cases. This guide covers the difference, the gotchas, and concrete code in Python and TypeScript.

TL;DR

Structured Outputs (json_schema, strict mode): model output is guaranteed to match your JSON Schema. 100 percent schema match on supported models. Use this by default.
JSON mode (json_object): model output is guaranteed to be valid JSON, with no schema enforcement. Legacy. Use only for older models or when you cannot write a schema.
Structured Outputs supports refusals as a first-class field. If the model declines for safety reasons, you get response.choices[0].message.refusal instead of malformed content.
Pydantic (Python) and Zod (TypeScript) integrate directly. Both SDKs auto-derive the JSON Schema and parse the response back into a typed object.
Supported on GPT-5.5, GPT-5.4, GPT-5.4-mini, GPT-5.4-nano, GPT-5.2-Codex. Not all old models support strict mode; check before you migrate.

The two formats, side by side

Feature	JSON mode (`json_object`)	Structured Outputs (`json_schema`, strict)
Output is valid JSON	Yes	Yes
Matches a specific schema	No	Yes (100 percent on supported models)
Schema validation by OpenAI	None	Constrained decoding
Pydantic/Zod auto-parsing	Manual	Built into SDK
Refusal field	No	Yes
First request latency	Standard	Slightly higher on first call per schema (compilation)
Subsequent requests	Standard	Standard (schema is cached)
Recommended in 2026	No (legacy)	Yes (default)

The key mental model: JSON mode is "the model tries to output JSON, please verify yourself." Structured Outputs is "OpenAI's decoder physically cannot emit tokens that violate your schema." That guarantee is enforced at the sampling layer, not by a post-hoc check.

Structured Outputs: the modern default

Here is a minimal example in Python. We define a Pydantic model and pass it directly to the SDK's .parse() helper, which derives the JSON Schema, sets strict mode, and parses the response into the Pydantic instance.

from openai import OpenAI
from pydantic import BaseModel
 
client = OpenAI()
 
class Ticket(BaseModel):
    title: str
    severity: str  # "low" | "medium" | "high" | "critical"
    affected_service: str
    summary: str
 
response = client.chat.completions.parse(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "Extract a ticket from the user's bug report."},
        {"role": "user", "content": "The checkout page returns a 500 on iOS. Started 2 hours ago."},
    ],
    response_format=Ticket,
)
 
ticket: Ticket = response.choices[0].message.parsed
print(ticket.severity, ticket.title)

The response is a fully typed Ticket instance. No json.loads, no try/except, no validation step. If the model refuses, message.parsed is None and message.refusal contains the reason.

In TypeScript with Zod:

import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";
 
const client = new OpenAI();
 
const Ticket = z.object({
  title: z.string(),
  severity: z.enum(["low", "medium", "high", "critical"]),
  affected_service: z.string(),
  summary: z.string(),
});
 
const response = await client.chat.completions.parse({
  model: "gpt-5.4",
  messages: [
    { role: "system", content: "Extract a ticket from the user's bug report." },
    { role: "user", content: "Checkout page returns 500 on iOS. Started 2 hours ago." },
  ],
  response_format: zodResponseFormat(Ticket, "ticket"),
});
 
const ticket = response.choices[0].message.parsed;
console.log(ticket?.severity, ticket?.title);

Same guarantees: typed output, refusal handling, no manual parsing.

What "strict" actually enforces

When you set strict: true in a json_schema response format, the constraints OpenAI enforces are:

Required fields are present. Every property in your schema's required array must appear in the output.
No additional properties. The schema implicitly sets additionalProperties: false. The model cannot invent extra keys.
Types match. A number field will be a number, not the string "42".
Enums match. A field with enum: ["low", "medium", "high"] will be one of those three values, never anything else.
Nesting is exact. Nested objects and arrays follow the same rules at every level.

Caveats to know:

All fields must be in required. Strict mode does not support optional fields directly. To make a field optional in the practical sense, mark it required and union the type with null (type: ["string", "null"]).
Top-level type must be object. You cannot return a raw array; wrap it ({ "items": [...] }).
Maximum depth of around 5 nesting levels and a few thousand object properties total. For most schemas this is not a constraint, but very large schemas may need to be split.
Schema compilation happens on first use. That first request is slightly slower (a few hundred ms). Subsequent requests with the same schema reuse the compiled grammar.

JSON mode: the legacy fallback

JSON mode predates Structured Outputs. It enforces only "the output parses as JSON." The model still chooses field names and types based on your prompt; nothing stops it from inventing a key, omitting a required field, or returning a number as a string.

response = client.chat.completions.create(
    model="gpt-5.4",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Return a JSON object with keys: title, severity, summary."},
        {"role": "user", "content": "Checkout page returns 500 on iOS."},
    ],
)
 
import json
data = json.loads(response.choices[0].message.content)
# Now hope `data` has the keys you asked for.

Why this still exists:

Older model versions. Some pinned model snapshots predate Structured Outputs and only support JSON mode.
Schema-free use cases. "Return a JSON object summarizing this article" with no fixed shape.
Migration in progress. Code that has not been updated yet.

For anything new in 2026, prefer Structured Outputs. JSON mode is the curl of structured generation: it does what it says, no more, and you own the validation.

Refusals: the field you cannot ignore

Structured Outputs introduces a refusal field on the assistant message. If the model declines a request for safety reasons, message.parsed is None and message.refusal holds a string explanation. Your code must check both.

msg = response.choices[0].message
 
if msg.refusal:
    log.warning("Model refused", reason=msg.refusal, request_id=response.id)
    return fallback_response()
 
ticket: Ticket = msg.parsed

If you skip the refusal check and just access .parsed, you get None and a confusing AttributeError two function calls later. The refusal path is rare for benign data-extraction prompts but real for anything close to a content-policy boundary.

Performance and cost

Structured Outputs and JSON mode bill the same way: standard per-token rates, no surcharge for either. The cost story is identical.

Latency:

First call for a new schema: Structured Outputs adds a one-time compilation step. Typically 100 to 500 ms of overhead.
Subsequent calls (same schema): identical to a normal call. The grammar is cached.
JSON mode: no compilation step, but you pay the overhead in your code (validation, retries on bad output).

If you make 1,000 requests against the same schema, the per-request overhead from Structured Outputs is negligible. If you generate a brand-new schema for every single request, the compilation overhead adds up. In practice, schemas are stable per feature, so this is rarely a problem.

When to use which

Use Structured Outputs (json_schema, strict) for:

Tool calling. (Function-calling tool definitions use the same strict-schema mechanism under the hood.)
Data extraction (entity extraction, form parsing, document understanding).
Agent state transitions where you need a typed action object.
Eval pipelines where the judge model returns a typed verdict. See What is Prompt Evaluation.
Anything that flows into a downstream typed system (DB insert, GraphQL mutation, queue message).

Use JSON mode (json_object) for:

One-off summarization where you do not want to write a schema.
Free-form research-style outputs where the shape genuinely varies.
Working around a model snapshot that does not support strict mode (rare in 2026).

If you are unsure, default to Structured Outputs. It is strictly more capable than JSON mode for any case where you know what fields you want.

Model support

As of May 2026, strict Structured Outputs is supported on:

GPT-5.5 (flagship)
GPT-5.4
GPT-5.4-mini
GPT-5.4-nano
GPT-5.2-Codex (with limitations on tool-use schemas)

Older snapshots of GPT-4-class models still get JSON mode but not strict Structured Outputs. Verify on the model's docs page before pinning.

Validating outputs anyway

A common question: "If Structured Outputs guarantees the schema, do I still need Pydantic or Zod validation?" Yes, for two reasons:

Defense in depth. Bugs happen on the OpenAI side too. A small extra check costs almost nothing.
Refinement constraints. JSON Schema can express types and enums but not semantic rules ("end_date must be after start_date", "amount must be positive when type is 'credit'"). Encode those in Pydantic validators or Zod refinements and run them after parsing.

from pydantic import BaseModel, model_validator
 
class Booking(BaseModel):
    start: str
    end: str
 
    @model_validator(mode="after")
    def end_after_start(self):
        if self.end <= self.start:
            raise ValueError("end must be after start")
        return self

If validation fails, treat it as a model error: log, optionally retry once, and surface the failure as a metric. This becomes one of the standard signals in an LLM observability stack.

Migrating from JSON mode to Structured Outputs

The migration is small and almost always worth it:

Define your output shape as a Pydantic or Zod model.
Replace response_format={"type": "json_object"} with the SDK's typed parse helper.
Delete your manual json.loads and field-validation code.
Add a refusal check.
Run your eval suite against both implementations. Confirm parity or improvement.

In our own migration of the eval-judging pipeline, we cut "malformed output" retries from roughly 2 percent of calls to zero, and removed about 60 lines of defensive parsing code per call site. Total migration time was a few hours.

FAQ

Does Structured Outputs work with streaming? Yes. The SDK exposes stream=True for parse helpers and emits partial parsed objects as tokens arrive.

Does it work with tool calling? Yes. Tool definitions already use strict JSON Schema under the hood. Setting strict: true on a tool definition gives the same guarantees as on a top-level response.

Can I use a recursive schema? Yes, with restrictions. You can reference siblings via $ref. Watch the depth limit.

Will it hurt quality versus free-form output? Empirically, no for well-scoped extraction tasks. Some open prompts where the model needs to "think" before structuring may benefit from a two-step approach: free-form reasoning first, then a separate strict-schema call to format it.

Does it support arrays at the top level? No. Wrap them in an object: { "items": [...] }. This is a hard constraint of the API.

What happens if my schema is invalid? The API returns a 400 with a schema error on the first request. The schema is validated at submission time before any tokens are generated.

Is there a token overhead for strict mode? The schema is sent as part of the request, so very large schemas add input tokens. For most schemas (a few hundred tokens) the overhead is negligible.

TL;DR

Structured Outputs (json_schema, strict mode): model output is guaranteed to match your JSON Schema. 100 percent schema match on supported models. Use this by default.
JSON mode (json_object): model output is guaranteed to be valid JSON, with no schema enforcement. Legacy. Use only for older models or when you cannot write a schema.
Structured Outputs supports refusals as a first-class field. If the model declines for safety reasons, you get response.choices[0].message.refusal instead of malformed content.
Pydantic (Python) and Zod (TypeScript) integrate directly. Both SDKs auto-derive the JSON Schema and parse the response back into a typed object.
Supported on GPT-5.5, GPT-5.4, GPT-5.4-mini, GPT-5.4-nano, GPT-5.2-Codex. Not all old models support strict mode; check before you migrate.

The two formats, side by side

Feature	JSON mode (`json_object`)	Structured Outputs (`json_schema`, strict)
Output is valid JSON	Yes	Yes
Matches a specific schema	No	Yes (100 percent on supported models)
Schema validation by OpenAI	None	Constrained decoding
Pydantic/Zod auto-parsing	Manual	Built into SDK
Refusal field	No	Yes
First request latency	Standard	Slightly higher on first call per schema (compilation)
Subsequent requests	Standard	Standard (schema is cached)
Recommended in 2026	No (legacy)	Yes (default)

Structured Outputs: the modern default

from openai import OpenAI
from pydantic import BaseModel
 
client = OpenAI()
 
class Ticket(BaseModel):
    title: str
    severity: str  # "low" | "medium" | "high" | "critical"
    affected_service: str
    summary: str
 
response = client.chat.completions.parse(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "Extract a ticket from the user's bug report."},
        {"role": "user", "content": "The checkout page returns a 500 on iOS. Started 2 hours ago."},
    ],
    response_format=Ticket,
)
 
ticket: Ticket = response.choices[0].message.parsed
print(ticket.severity, ticket.title)

The response is a fully typed Ticket instance. No json.loads, no try/except, no validation step. If the model refuses, message.parsed is None and message.refusal contains the reason.

In TypeScript with Zod:

import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";
 
const client = new OpenAI();
 
const Ticket = z.object({
  title: z.string(),
  severity: z.enum(["low", "medium", "high", "critical"]),
  affected_service: z.string(),
  summary: z.string(),
});
 
const response = await client.chat.completions.parse({
  model: "gpt-5.4",
  messages: [
    { role: "system", content: "Extract a ticket from the user's bug report." },
    { role: "user", content: "Checkout page returns 500 on iOS. Started 2 hours ago." },
  ],
  response_format: zodResponseFormat(Ticket, "ticket"),
});
 
const ticket = response.choices[0].message.parsed;
console.log(ticket?.severity, ticket?.title);

Same guarantees: typed output, refusal handling, no manual parsing.

What "strict" actually enforces

When you set strict: true in a json_schema response format, the constraints OpenAI enforces are:

Required fields are present. Every property in your schema's required array must appear in the output.
No additional properties. The schema implicitly sets additionalProperties: false. The model cannot invent extra keys.
Types match. A number field will be a number, not the string "42".
Enums match. A field with enum: ["low", "medium", "high"] will be one of those three values, never anything else.
Nesting is exact. Nested objects and arrays follow the same rules at every level.

Caveats to know:

All fields must be in required. Strict mode does not support optional fields directly. To make a field optional in the practical sense, mark it required and union the type with null (type: ["string", "null"]).
Top-level type must be object. You cannot return a raw array; wrap it ({ "items": [...] }).
Maximum depth of around 5 nesting levels and a few thousand object properties total. For most schemas this is not a constraint, but very large schemas may need to be split.
Schema compilation happens on first use. That first request is slightly slower (a few hundred ms). Subsequent requests with the same schema reuse the compiled grammar.

JSON mode: the legacy fallback

response = client.chat.completions.create(
    model="gpt-5.4",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Return a JSON object with keys: title, severity, summary."},
        {"role": "user", "content": "Checkout page returns 500 on iOS."},
    ],
)
 
import json
data = json.loads(response.choices[0].message.content)
# Now hope `data` has the keys you asked for.

Why this still exists:

Older model versions. Some pinned model snapshots predate Structured Outputs and only support JSON mode.
Schema-free use cases. "Return a JSON object summarizing this article" with no fixed shape.
Migration in progress. Code that has not been updated yet.

For anything new in 2026, prefer Structured Outputs. JSON mode is the curl of structured generation: it does what it says, no more, and you own the validation.

Refusals: the field you cannot ignore

msg = response.choices[0].message
 
if msg.refusal:
    log.warning("Model refused", reason=msg.refusal, request_id=response.id)
    return fallback_response()
 
ticket: Ticket = msg.parsed

Performance and cost

Structured Outputs and JSON mode bill the same way: standard per-token rates, no surcharge for either. The cost story is identical.

Latency:

First call for a new schema: Structured Outputs adds a one-time compilation step. Typically 100 to 500 ms of overhead.
Subsequent calls (same schema): identical to a normal call. The grammar is cached.
JSON mode: no compilation step, but you pay the overhead in your code (validation, retries on bad output).

When to use which

Use Structured Outputs (json_schema, strict) for:

Tool calling. (Function-calling tool definitions use the same strict-schema mechanism under the hood.)
Data extraction (entity extraction, form parsing, document understanding).
Agent state transitions where you need a typed action object.
Eval pipelines where the judge model returns a typed verdict. See What is Prompt Evaluation.
Anything that flows into a downstream typed system (DB insert, GraphQL mutation, queue message).

Use JSON mode (json_object) for:

One-off summarization where you do not want to write a schema.
Free-form research-style outputs where the shape genuinely varies.
Working around a model snapshot that does not support strict mode (rare in 2026).

If you are unsure, default to Structured Outputs. It is strictly more capable than JSON mode for any case where you know what fields you want.

Model support

As of May 2026, strict Structured Outputs is supported on:

GPT-5.5 (flagship)
GPT-5.4
GPT-5.4-mini
GPT-5.4-nano
GPT-5.2-Codex (with limitations on tool-use schemas)

Older snapshots of GPT-4-class models still get JSON mode but not strict Structured Outputs. Verify on the model's docs page before pinning.

Validating outputs anyway

A common question: "If Structured Outputs guarantees the schema, do I still need Pydantic or Zod validation?" Yes, for two reasons:

Defense in depth. Bugs happen on the OpenAI side too. A small extra check costs almost nothing.
Refinement constraints. JSON Schema can express types and enums but not semantic rules ("end_date must be after start_date", "amount must be positive when type is 'credit'"). Encode those in Pydantic validators or Zod refinements and run them after parsing.

from pydantic import BaseModel, model_validator
 
class Booking(BaseModel):
    start: str
    end: str
 
    @model_validator(mode="after")
    def end_after_start(self):
        if self.end <= self.start:
            raise ValueError("end must be after start")
        return self

If validation fails, treat it as a model error: log, optionally retry once, and surface the failure as a metric. This becomes one of the standard signals in an LLM observability stack.

Migrating from JSON mode to Structured Outputs

The migration is small and almost always worth it:

Define your output shape as a Pydantic or Zod model.
Replace response_format={"type": "json_object"} with the SDK's typed parse helper.
Delete your manual json.loads and field-validation code.
Add a refusal check.
Run your eval suite against both implementations. Confirm parity or improvement.

FAQ

Does Structured Outputs work with streaming? Yes. The SDK exposes stream=True for parse helpers and emits partial parsed objects as tokens arrive.

Can I use a recursive schema? Yes, with restrictions. You can reference siblings via $ref. Watch the depth limit.

Does it support arrays at the top level? No. Wrap them in an object: { "items": [...] }. This is a hard constraint of the API.

What happens if my schema is invalid? The API returns a 400 with a schema error on the first request. The schema is validated at submission time before any tokens are generated.

Is there a token overhead for strict mode? The schema is sent as part of the request, so very large schemas add input tokens. For most schemas (a few hundred tokens) the overhead is negligible.

OpenAI Structured Outputs vs JSON Mode

TL;DR

The two formats, side by side

Structured Outputs: the modern default

What "strict" actually enforces

JSON mode: the legacy fallback

Refusals: the field you cannot ignore

Performance and cost

When to use which

Model support

Validating outputs anyway

Migrating from JSON mode to Structured Outputs

FAQ

Related articles

Claude vs ChatGPT: The Honest 2026 Comparison

Codex vs Claude Code: The Honest 2026 Comparison

DeepSeek vs ChatGPT: The Honest 2026 Comparison

Built for AI agents.
Break less.
Ship more.

OpenAI Structured Outputs vs JSON Mode

TL;DR

The two formats, side by side

Structured Outputs: the modern default

What "strict" actually enforces

JSON mode: the legacy fallback

Refusals: the field you cannot ignore

Performance and cost

When to use which

Model support

Validating outputs anyway

Migrating from JSON mode to Structured Outputs

FAQ

Related articles

Claude vs ChatGPT: The Honest 2026 Comparison

Codex vs Claude Code: The Honest 2026 Comparison

DeepSeek vs ChatGPT: The Honest 2026 Comparison

Built for AI agents.
Break less.
Ship more.

Related articles

Comparison
Claude vs ChatGPT: The Honest 2026 Comparison
Claude vs ChatGPT compared head-to-head: model lineup, context windows, coding ability, pricing, multimodal, agents, voice, developer experience, and when to choose each. From a team running 80M+ LLM requests per day across both.
Frank Chen · 1 day ago

Comparison
Codex vs Claude Code: The Honest 2026 Comparison
Codex vs Claude Code compared: OpenAI's GPT-5.2-Codex agent vs Anthropic's terminal coding agent, capabilities, pricing, when to choose each. Verified May 2026.
Frank Chen · 1 day ago

Comparison
DeepSeek vs ChatGPT: The Honest 2026 Comparison
DeepSeek vs ChatGPT compared head-to-head: model lineup (DeepSeek V3, R1 reasoning vs GPT-5.5 / 5.4 / 5.4 nano), pricing (where DeepSeek's edge is most extreme), context, capabilities, agents, geopolitics. Verified May 2026 pricing.
Frank Chen · 1 day ago

OpenAI Structured Outputs vs JSON Mode

TL;DR

The two formats, side by side

Structured Outputs: the modern default

What "strict" actually enforces

JSON mode: the legacy fallback

Refusals: the field you cannot ignore

Performance and cost

When to use which

Model support

Validating outputs anyway

Migrating from JSON mode to Structured Outputs

FAQ

Related

Related articles

Claude vs ChatGPT: The Honest 2026 Comparison

Codex vs Claude Code: The Honest 2026 Comparison

DeepSeek vs ChatGPT: The Honest 2026 Comparison

Built for AI agents. Break less. Ship more.

OpenAI Structured Outputs vs JSON Mode

TL;DR

The two formats, side by side

Structured Outputs: the modern default

What "strict" actually enforces

JSON mode: the legacy fallback

Refusals: the field you cannot ignore

Performance and cost

When to use which

Model support

Validating outputs anyway

Migrating from JSON mode to Structured Outputs

FAQ

Related

Related articles

Claude vs ChatGPT: The Honest 2026 Comparison

Codex vs Claude Code: The Honest 2026 Comparison

DeepSeek vs ChatGPT: The Honest 2026 Comparison

Built for AI agents. Break less. Ship more.

Built for AI agents.
Break less.
Ship more.

Built for AI agents.
Break less.
Ship more.