OpenAI Responses API Shell Tool: Full Agent Runtime Guide | AI Bytes
AI Newsnews
OpenAI Gives AI Agents a Full Linux Terminal — Here's How
OpenAI's Responses API now ships with a shell tool and hosted Debian containers, turning models into persistent agents that execute code, query databases, and manage files in isolated environments.
OpenAI just handed its AI models the keys to a full Linux terminal. And honestly? This might be the most consequential API update the company has shipped all year.
On March 11, 2026, OpenAI published a deep dive into how they built an agent runtime on top of the Responses API — combining a new shell tool, hosted containers, and persistent state management into something that feels less like a chatbot and more like hiring a junior developer who never sleeps.
What Is the OpenAI Responses API Shell Tool?
The Responses API shell tool lets AI models propose and execute Unix commands inside isolated, containerized environments. Unlike the older Code Interpreter (which was Python-only), the shell tool supports six programming languages out of the box and gives agents access to file systems, databases, and controlled network access.
Think of it this way: instead of asking a model to describe how to process a CSV, the model can now actually spin up a container, write a Python script, run it, read the output, and iterate — all within a single API call loop.
The shift from "model that talks about code" to "model that runs code" is the real story here. Everything else is implementation detail.
Inside the Container: Debian 12 With Six Languages
The hosted environment (triggered by setting container_auto in your API call) provisions an OpenAI-managed Debian 12 container with a surprisingly generous stack:
Python 3.11
Node.js 22.16
Java 17.0
Go 1.23
Ruby 3.1
PHP 8.2
The default working directory is /mnt/data. Commands run without sudo privileges, and there's no interactive TTY — the model sends commands, gets stdout/stderr back, and decides what to do next. Containers expire after 20 minutes of inactivity, and all ephemeral storage is lost when they do.
But here's where it gets interesting. OpenAI also introduced reusable containers via container_reference, which you create through a POST /v1/containers endpoint. You set memory limits, expiration policies, and reference them across multiple API turns. That's persistent multi-step workflows — data analysis pipelines, test suites, file transformations — all running in a sandboxed environment tied to your API key.
Network Access: Locked Down by Default
Security is tight. Outbound network access is disabled by default. To enable it, you have to configure an organization-level allowlist and specify a network_policy in each request. Domain secrets (API keys, tokens) get injected as environment variables — the model only sees placeholder names like $API_KEY, never the actual values.
This is a smart design. It means agents can hit external APIs without the model ever having access to your credentials in its context window.
OpenAI's approach to network security here — allowlists plus credential masking — is exactly the kind of paranoid-by-default thinking enterprise customers need.
Why This Matters: From Chat to Compute
The Responses API shell tool fundamentally changes what "using an AI API" means.
Before this, you had two options: Code Interpreter (Python sandbox, limited) or building your own execution environment and wiring it together with function calling. Both had friction. Code Interpreter couldn't install arbitrary packages or run multi-language workflows. Custom execution required infrastructure work that most teams don't want to maintain.
Now? You pass "type": "shell" in your tools array, and OpenAI handles container provisioning, execution, output streaming, and cleanup. The model proposes shell commands, the Responses API forwards them to the container runtime, and results stream back into the conversation context.
The shell tool works with GPT-5.4 and compatible models. You'll need to be on a recent model version to use it.
Context Compaction: How Agents Run for Hours
One of the less flashy but critically important features is server-side context compaction. Long-running agents generate massive context windows — shell outputs, file contents, intermediate results. OpenAI's compaction system analyzes the conversation state and produces what they describe as "encrypted, token-efficient representations of prior context."
In practice, this means agents can run for hours or even days without hitting context limits. The system triggers automatically when you approach the ceiling, rather than rejecting your request. And they built it using their own Codex system, which is a nice bit of dogfooding.
What It Costs
Starting March 31, 2026, container usage will be billed at $0.03 per container plus $0.03 per 20-minute session. Standard token pricing for your chosen model still applies on top of that. So running a GPT-5.4 agent with shell access costs you $2.50/$15 per million tokens for input/output, plus the container fees. The smaller GPT-5.4-mini comes in at $0.75/$4.50 per million tokens if you want to keep costs down.
That's pretty reasonable for what you're getting. A 20-minute container session with an agent doing real work — processing files, querying a SQLite database, running tests — for three cents is hard to argue with.
Skills: The New Standard for Agent Tools
OpenAI also introduced Skills — reusable instruction bundles you mount onto agents via a SKILL.md manifest with YAML frontmatter. You reference them in your API calls with skill_reference objects. Interestingly, both OpenAI and Anthropic have converged on the same open standard for skills, which suggests the industry is settling on common patterns for agent tool packaging.
The Competitive Picture
OpenAI isn't the only player building agent runtimes — the company itself recently added computer use to the Responses API. Anthropic's Claude already supports computer use and tool execution, and Google's Gemini has deep integration with Google Cloud infrastructure. But OpenAI's approach stands out for its API-first design — this isn't a consumer feature bolted onto a chat interface. It's infrastructure for developers.
The race isn't about who has the smartest model anymore. It's about who gives that model the best tools to actually do things.
The shell tool also fills a gap that the Agents SDK left open. While the OpenAI Agents SDK handles orchestration patterns in your own code, the shell tool moves execution to OpenAI's infrastructure. You can use both together — the SDK for workflow logic, the Responses API for sandboxed compute.
What Comes Next
A few things to watch. First, the Zero Data Retention (ZDR) limitation. Hosted shell containers are not compatible with ZDR, which means regulated industries that need zero data retention will have to use the local shell mode and manage their own execution environments. That's a meaningful gap for healthcare, finance, and government use cases.
Second, the 20-minute expiration window is tight for longer workflows. OpenAI will almost certainly extend this or offer persistent container tiers — the container_reference endpoint already hints at that direction.
And third, expect other providers to ship similar features fast. The pattern of "API + sandboxed execution + persistent state" is becoming the standard architecture for production AI agents. Anthropic, Google, and the open-source community are all working on variations of this.
The bottom line: if you're building AI agents that need to do real work — not just generate text but actually execute code, process data, and interact with services — the Responses API shell tool is the most production-ready option available right now. It's not perfect (the ZDR gap and container limits are real constraints), but it's a massive step forward from where we were six months ago.
The shell tool is a feature within OpenAI's Responses API that lets AI models propose and execute Unix commands inside isolated, containerized environments. It supports six programming languages (Python, Node.js, Java, Go, Ruby, PHP) and provides file system access, database support, and controlled network access.
How much does the Responses API shell tool cost?
Starting March 31, 2026, containers cost $0.03 per container plus $0.03 per 20-minute session. Standard model token pricing (e.g., $2.50/$15 per MTok for GPT-5.4) applies on top of the container fees.
What programming languages does the OpenAI shell tool support?
The hosted container environment includes Python 3.11, Node.js 22.16, Java 17.0, Go 1.23, Ruby 3.1, and PHP 8.2, all running on Debian 12.
How is the shell tool different from Code Interpreter?
Code Interpreter was limited to Python in a sandboxed environment. The shell tool supports six languages, allows running services, making API calls, querying databases, and provides persistent multi-turn containers with configurable network access.
Is the Responses API shell tool compatible with Zero Data Retention?
No. Hosted shell containers are not compatible with Zero Data Retention (ZDR). Organizations that require ZDR must use the local shell mode and manage their own execution environments.