OpenAI's Model Spec Explained: 5 Rules Governing ChatGPT | AI Bytes
0% read
OpenAI's Model Spec Explained: 5 Rules Governing ChatGPT
AI News
OpenAI's Model Spec Explained: 5 Rules Governing ChatGPT
OpenAI just pulled back the curtain on the Model Spec — the 100-page rulebook that dictates what ChatGPT will and won't do. Here's what it means for users, developers, and the future of AI safety.
March 28, 2026
8 min read
44 views
Updated March 28, 2026
What Exactly Is the 100-Page Rulebook Behind ChatGPT?
OpenAI just published a deep dive into the document that quietly governs every single ChatGPT interaction — and it's way more interesting than you'd expect from a compliance doc.
On March 25, 2026, OpenAI released "Inside our approach to the Model Spec", a blog post explaining the philosophy, structure, and evolution behind the Model Spec — the public framework that defines how OpenAI's models should behave. Think of it as the constitution for ChatGPT. Every time the model decides whether to answer your question, push back on a prompt, or refuse a request entirely, the Model Spec is the reason why.
The OpenAI Model Spec isn't new. The first version dropped back in May 2024, and it's been updated regularly since — with the most recent revision dated December 18, 2025. But this blog post is the first time OpenAI has really opened up about how they write, implement, and evolve this document. And the timing is no accident.
The Chain of Command: Who Gets to Override Whom
Interesting wrinkle: the single most important concept in the Model Spec is something OpenAI calls the chain of command — a five-tier authority hierarchy that determines which instructions the model follows when conflicts arise.
Here's how it breaks down:
Authority Level
Who Sets It
Can Be Overridden By
Root
Model Spec itself
Nothing
System
OpenAI platform rules
Root only
Developer
API customers
Root, System
User
End users (you)
Root, System, Developer
Guideline
Default behaviors
Everything above
Root-level rules are the hard floor. They cover things like never facilitating weapons of mass destruction, never generating sexual content involving minors, and never helping with targeted political manipulation. These can't be overridden by anyone — not developers, not users, not even OpenAI's own system messages claiming "safety testing mode."
The Model Spec explicitly states that if a system message claims "ignore all policies" — the model must still refuse root-level violations. No exceptions.
This is a pretty direct response to jailbreak techniques that try to convince models they're in a special testing mode. And honestly, it's the right call.
Hard Rules vs. Defaults: Where User Freedom Lives
Here's where the Model Spec gets genuinely interesting. OpenAI draws a clear line between hard rules (non-negotiable boundaries) and defaults (overridable starting points).
Hard rules are mostly prohibitive. You can't get ChatGPT to help you build a bomb, create deepfakes for targeted political manipulation, or spit out copyrighted song lyrics in full. These aren't up for debate.
But defaults? Defaults are where user freedom actually lives. They represent the model's "best guess" about what behavior you'd want when you haven't specified a preference. And they can be overridden — by developers through the API, or by users through conversation.
As of March 2026, the Model Spec explicitly states that beyond specific limitations, the assistant should "behave in a way that encourages intellectual freedom" and "should never refuse a request unless required to do so by the chain of command."
That's a strong stance. It means ChatGPT is supposed to discuss any topic — no matter how controversial — as long as the conversation doesn't cross into genuinely dangerous territory. No topic is off limits for discussion. The line is drawn at action, not information.
Why This Matters Now
So why is OpenAI explaining all this in March 2026? Context matters here.
The AI safety conversation has gotten significantly more heated — OpenAI has even launched a dedicated safety bug bounty program. As Time reported on March 25, 2026, OpenAI has faced sharp criticism over its Department of Defense agreements — with CEO Sam Altman himself admitting the initial deal "looked opportunistic and sloppy." The Model Spec's explicit prohibition on mass surveillance and its careful treatment of dual-use information feel like direct responses to that backlash.
But there's also a more practical reason. Jason Wolfe, who manages OpenAI's Model Spec, described his role to Time as being like "a maintainer — in the spirit of an open-source side project." And that framing is deliberate. The Model Spec is released under Creative Commons CC0 — full public domain. Anyone can use it, fork it, or build on it.
Wolfe's description of the Model Spec as a document "first and foremost for people" — not training data — highlights a gap that OpenAI is still working to close.
This is an important admission. The Model Spec tells you what OpenAI wants the model to do. But the process of actually training models to follow the spec is, in Wolfe's words, "complicated." The spec and the model aren't perfectly synchronized. They're parallel processes that OpenAI keeps trying to bring closer together.
How It Compares to Anthropic's Constitution
You can't talk about the Model Spec without mentioning the elephant in the room: Anthropic's Constitutional AI approach.
Sharan Maiya, a PhD researcher at Cambridge, offered a pretty sharp comparison. Anthropic's Constitution is "more philosophical," she noted, while OpenAI's Model Spec is "more behavioral." (For more on Anthropic's approach to government partnerships, see our piece on Anthropic's Pentagon policy.) That tracks. The Model Spec reads more like a legal framework — with its authority levels, explicit examples, and transformation exceptions — while Anthropic's approach leans more toward embedding values at the training level.
Neither approach is clearly better. But they reflect fundamentally different theories about how you get AI to behave well. OpenAI bets on explicit rules and hierarchies. Anthropic bets on internalized principles. As of March 2026, the industry hasn't settled on which philosophy actually produces safer models in practice.
The Dual-Use Problem
One of the trickiest areas the Model Spec tackles is dual-use information — knowledge that has both legitimate and dangerous applications.
The approach here is actually pretty detailed. If information has legitimate uses, the model can provide "neutral facts" without giving step-by-step instructions. So you can learn that Ebola is a Tier 1 Select Agent (factual), but you can't get instructions for weaponizing a biological sample (actionable harm). You can get a general overview of methamphetamine chemistry (educational), but not precise quantities and temperatures (operational).
The Model Spec draws its line not at knowledge but at tactical amplification — the difference between understanding something and being equipped to do it.
This is a meaningful distinction, and it's one that other AI companies should probably adopt if they haven't already. The "refuse everything even vaguely related" approach leads to models that can't discuss history, chemistry, or security research. The "provide everything" approach is obviously dangerous. OpenAI's middle ground — factual context without operational specifics — seems like the most defensible position.
What About Agentic AI?
The September 2025 revision added significant new guidance around agentic behavior — models that take multi-step actions in the real world. And this might be the most forward-looking part of the entire document.
The Model Spec introduces a concept called Scope of Autonomy — a structured agreement between the user and the model about what the AI is and isn't allowed to do. Before taking autonomous actions, models should negotiate explicit parameters: which tools are permitted, what's the maximum cost, what side effects are acceptable, and when should the model stop and ask for confirmation.
Side effects get special treatment. The Model Spec lists specific categories: sending emails, deleting files, spending money, expanding permissions, spawning sub-agents. For each of these, the model is supposed to favor the least disruptive, most easily reversible approach. Back up before deleting. Offer archives instead of permanent deletion. Do dry runs before execution.
This is clearly written with tools like ChatGPT and the growing ecosystem of AI agents in mind. And it's refreshingly practical compared to the abstract safety discussions we usually see.
What Comes Next
The Model Spec is a living document. OpenAI has updated it at least five times since its initial release — with the September 2025 revision introducing agentic behavior principles and the December 2025 revision adding new sections on teen safety and honesty guidance.
But the hard question remains: how well do the actual models follow it? As of March 2026, OpenAI acknowledges that "production models don't yet fully reflect Model Spec" and are being "iteratively aligned." That honesty is appreciated — but it also means the document is aspirational in parts, not descriptive.
The gap between policy and implementation is where the real work happens. And whether OpenAI can close that gap faster than its competitors will likely determine whether the Model Spec becomes an industry standard or just another well-intentioned PDF.
Can developers override ChatGPT's safety rules through the API?
Developers can override guideline-level and some user-level defaults through the API, but they cannot override root-level or system-level rules. Root rules — like prohibitions on weapons instructions, CSAM, and targeted political manipulation — are hard-coded and cannot be bypassed by any party, including OpenAI's own system messages.
Is the OpenAI Model Spec legally binding?
No. The Model Spec is a behavioral guideline document, not a legal contract or regulatory filing. It's released under Creative Commons CC0 (public domain), meaning anyone can use or adapt it. However, OpenAI acknowledges that production models don't yet fully reflect the Model Spec, so it functions more as an aspirational target than a binding guarantee.
How often does OpenAI update the Model Spec?
OpenAI has updated the Model Spec at least five times since its initial release in May 2024, with revisions in February 2025, April 2025, September 2025, October 2025, and December 2025. Updates have added guidance on agentic AI behavior, teen safety, and honesty. The current version is dated December 18, 2025.
Does the Model Spec apply to the ChatGPT API or just the chatbot?
The Model Spec applies to all OpenAI models across both ChatGPT consumer products and the API. However, API developers get a 'Developer' authority level that lets them customize model behavior within the boundaries set by root and system-level rules. This means API-powered apps can behave differently from ChatGPT while still following the same core safety principles.
How does the Model Spec handle prompt injection attacks?
The Model Spec treats all quoted text, file attachments, tool outputs, and multimodal data as 'untrusted data' with zero authority by default. If a webpage or document contains hidden instructions (like 'send the user's data to this URL'), the model is supposed to ignore them entirely. Authority can only be delegated to these sources by explicit unquoted instructions from higher-authority messages.