Code Interpreter is OpenAI's hosted Python sandbox. The model writes Python, OpenAI runs it on a managed VM, the model reads the output and continues the conversation. You see it inside ChatGPT under the "Analyze data" button; via the API you get the same primitive as a tool on an Assistant. In 2026 it remains one of the most underused parts of the OpenAI platform, mostly because the Assistants API was always more verbose than the Chat Completions API and engineers default to what they know.
The honest answer is that Code Interpreter is excellent for a narrow set of jobs: ad-hoc data analysis, file format conversion, chart generation, math the model would otherwise get wrong, and any task where letting the model "try, see the error, try again" outperforms one-shot generation. It is overpriced and overkill for tasks that do not need a sandbox at all, like JSON manipulation or string transforms that the model can do directly.
This guide covers what Code Interpreter can and cannot do, how to spin one up via the API, the pricing model, and when you should run Python in your own sandbox instead. For broader context on agent tooling, see best AI agent frameworks and best LLM gateways.
TL;DR
- What it is: a Python sandbox the model can call as a tool, with internet disabled, files mountable, charts and tables renderable.
- Where to use it: the Assistants API today, with Code Interpreter exposed as a built-in tool. The same capability is surfaced inside ChatGPT for end users.
- Pricing: $0.03 per session. A session is one hour of activity within a thread; concurrent threads spawn separate sessions.
- Sweet spot: data analysis on user-uploaded CSV/Excel, file format conversions, chart generation, structured-output validation, math-heavy reasoning.
- Skip it when: you need internet access from the sandbox, you need GPU compute, you need persistent state across sessions, or the task is pure text generation.
- DIY alternative: running Python in your own sandbox (Modal, E2B, Daytona) gives you internet, GPUs, and custom environments; you give up the model's built-in self-correction loop.
What Code Interpreter actually does
Under the hood, Code Interpreter is a Python environment with a pinned set of libraries: pandas, numpy, matplotlib, scipy, scikit-learn, Pillow, openpyxl, PyPDF2, and friends. The model emits Python, the runtime executes it, stdout and any generated files come back to the model as tool output. The model can iterate: read an error, fix the code, run again.
What this means in practice:
- It can read files. Upload a CSV, ask "what's the average revenue by month with seasonality removed," get an answer plus a chart.
- It can write files. Generate a PNG, a CSV, an Excel file, return it to the user.
- It can do math without hallucinating. Numerical integration, statistical tests, regression: actually correct because actual Python.
- It cannot reach the internet. No
requests.get, no API calls, nopip install. The environment is locked. - It cannot run GPU code. No PyTorch CUDA, no Triton. CPU only.
- It cannot persist across sessions. Files live for the session, then are garbage collected.
The sandbox itself is roughly equivalent to a small VM with no network. The model's strength is that it can self-correct: if pandas throws a KeyError, the model reads the error and tries again. That iteration loop is what makes Code Interpreter useful beyond what a one-shot Python generator can do.
Spinning up an Assistant with Code Interpreter
The setup is three calls: create an assistant, create a thread, run a message.
from openai import OpenAI
client = OpenAI()
# 1. Create the assistant (once, store the ID)
assistant = client.beta.assistants.create(
name="Data Analyst",
instructions=(
"You analyze uploaded data files. "
"Use Python to compute results. Always show your work."
),
model="gpt-5.4",
tools=[{"type": "code_interpreter"}],
)
# 2. Create a thread for this conversation
thread = client.beta.threads.create()
# 3. Send a message
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Compute the standard deviation of [3, 1, 4, 1, 5, 9, 2, 6, 5, 3].",
)
# 4. Run and wait
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id,
)
# 5. Read messages
messages = client.beta.threads.messages.list(thread_id=thread.id)
for m in reversed(list(messages)):
print(m.role, m.content)In TypeScript the shape is identical:
import OpenAI from "openai";
const client = new OpenAI();
const assistant = await client.beta.assistants.create({
name: "Data Analyst",
model: "gpt-5.4",
tools: [{ type: "code_interpreter" }],
});
const thread = await client.beta.threads.create();
await client.beta.threads.messages.create(thread.id, {
role: "user",
content: "Compute the std dev of [3, 1, 4, 1, 5, 9, 2, 6, 5, 3].",
});
const run = await client.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: assistant.id,
});
const messages = await client.beta.threads.messages.list(thread.id);The Assistants API is event-driven by default but create_and_poll keeps the code linear, which is what you want for examples and most batch jobs. For interactive UIs, use the streaming variant.
Uploading files
The killer feature is letting the model read files. Upload with purpose="assistants", then attach to a message.
# Upload a CSV
file = client.files.create(
file=open("sales.csv", "rb"),
purpose="assistants",
)
# Attach to a message via the tool_resources or message attachments
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="What's the month-over-month growth rate in this file?",
attachments=[
{"file_id": file.id, "tools": [{"type": "code_interpreter"}]}
],
)Files mounted this way show up at /mnt/data/ inside the sandbox. The model picks the right path automatically.
The opposite direction also works: when the model produces a file (a chart, a converted Excel), it shows up as a file attachment on the assistant's message, and you can download via the Files API.
for m in client.beta.threads.messages.list(thread_id=thread.id):
for c in m.content:
if c.type == "image_file":
file_id = c.image_file.file_id
content = client.files.content(file_id).read()
with open(f"{file_id}.png", "wb") as f:
f.write(content)Pricing: how a session works
Code Interpreter is billed at $0.03 per session. The pricing model is the one most teams get wrong, so it is worth being explicit.
A session is created the first time Code Interpreter is invoked inside a thread. It stays alive for one hour after the last invocation. If the same thread has another Code Interpreter call inside that hour, it reuses the session: no extra charge. If you spin up a new thread for each user message, every message that uses Code Interpreter is a separate session.
Concurrency matters too. Two threads running Code Interpreter at the same time are two sessions, even on the same assistant.
The cost implications:
- Pattern A: long-lived thread per user conversation. A user asks 10 questions over an hour, only the first triggers a session start. You pay $0.03 for the whole conversation.
- Pattern B: new thread per request. Every call that uses Code Interpreter is $0.03 even if the user makes 50 requests in a minute. At $0.03 * 50 = $1.50 per user-minute, this gets expensive fast.
The right architectural decision is almost always Pattern A: keep the thread alive for the user's session, reuse it across messages. The Assistants API is designed for this.
On top of the per-session fee, you still pay normal token costs for the model. A typical Code Interpreter conversation with GPT-5.4 will be dominated by token cost, not session cost, unless you spawn many short sessions.
What it is good for
The honest list of cases where Code Interpreter is the right tool:
Ad-hoc data analysis on user uploads. "Here is my CSV, tell me what is interesting." Pandas can do this; the model can write the pandas. No DIY required.
File format conversion. PDF to CSV, Excel to JSON, image cropping. The model knows the libraries, the sandbox has them installed.
Chart generation. Matplotlib charts inline in a chat UI, no separate plotting service.
Math the model would otherwise get wrong. Anything involving large numbers, statistical tests, numerical integration, or precision-sensitive arithmetic. The model writes the code instead of guessing the answer.
Schema validation and data cleaning. "Validate that every row in this CSV has a valid email, list the bad ones."
Structured output verification. When the model produces JSON, it can run a JSON Schema check inside the sandbox before returning. Catches more issues than a regex.
What it is not good for
Anything that needs internet. No web scraping, no API calls, no fetching live data. Pre-download the data and upload it as a file, or use a different tool.
Anything that needs GPUs. Image generation with diffusion models, fine-tuning, audio synthesis. Not the sandbox's job.
Anything stateful across users. No shared database, no caching layer. Sessions are per-thread and short-lived.
Anything latency-sensitive. Spinning up a session adds noticeable latency on first invocation. For interactive single-shot queries, the overhead is real.
Anything you need to debug. The sandbox logs and traces are limited. If the model's Python is producing wrong answers, you need a strong observability story to figure out why.
DIY: running Python in your own sandbox
The alternative is to run Python yourself in a sandbox you control. Modal, E2B, Daytona, Fly.io Machines, even a Firecracker VM if you like pain. You give the model a run_python(code: str) tool that calls into your sandbox.
The trade-offs:
| Aspect | Code Interpreter | DIY sandbox |
|---|---|---|
| Setup | Three lines of code | Provision, secure, monitor |
| Cost | $0.03 per session + tokens | Compute time + tokens |
| Internet access | No | Yes (you control) |
| GPU support | No | Yes if you provision them |
| Custom libraries | Pinned set only | Whatever you install |
| Self-correction loop | Built-in | You wire it up |
| Sandbox security | OpenAI's problem | Your problem |
For most teams getting started, Code Interpreter is the right call: it works in an afternoon. For teams that need internet, custom libraries, or GPU compute, a DIY sandbox is unavoidable. E2B and Modal are the easiest paths in 2026; both expose a "spin up a Python kernel, run code, get output" API that maps cleanly onto a tool definition.
If you go DIY, the part you have to build yourself is the iteration loop. Code Interpreter handles "error, retry, fix" automatically. In your own setup, you need to feed stderr back to the model and let it decide whether to retry or give up.
Observability for Code Interpreter sessions
Every Code Interpreter call should produce a span in your trace. You want to capture: the user prompt, the Python the model wrote, the stdout/stderr from the sandbox, any files produced, total tokens, session cost. Without this, debugging a Code Interpreter failure is guesswork.
If you instrument with a tracing SDK like Respan's, the Assistants API spans capture all of this automatically. See LLM Tracing for the architecture, and LLM Observability for the platform view. The trace UI shows the model's Python alongside the runtime output so you can see exactly what went wrong.
FAQ
Is Code Interpreter the same in ChatGPT and the API? The sandbox is the same. The API gives you programmatic access via the Assistants API; ChatGPT exposes it as a user-facing button.
Can I use Code Interpreter with Chat Completions instead of Assistants? Not directly. Code Interpreter is currently exposed as an Assistants API tool. If you want the same capability under Chat Completions, you build your own Python sandbox tool.
How is Code Interpreter different from function calling? Function calling lets the model call your code. Code Interpreter lets the model write and run new code in a sandbox. Different abstractions, often combined: function calling for known APIs, Code Interpreter for ad-hoc compute.
Does Code Interpreter have file size limits? Yes. Individual file uploads are capped, and the total storage in the sandbox is limited. For multi-gigabyte files, chunk and stream.
Can I install custom Python packages? No. The library set is pinned by OpenAI. If you need a specific package, run your own sandbox.
What languages does it support besides Python? Python is the only first-class language. The model can technically write code that shells out, but the runtime is locked down enough that you should treat Code Interpreter as Python-only.
Is the sandbox safe for user input? The sandbox is isolated from your infrastructure. It is not isolated from the data you upload to it. Do not upload secrets to Code Interpreter; the model will read them and may include them in responses.