Does Anthropic actually work with the Pentagon?

Yes. Starting in late 2024, Anthropic partnered with Palantir and AWS to make Claude available to U.S. intelligence and defense agencies. However, these partnerships come with strict acceptable use policies that prohibit autonomous weapons, mass surveillance, and unsupervised lethal decision-making.

What is a trust hierarchy for AI deployment?

A trust hierarchy is a framework that classifies AI use cases into risk tiers — from full automation (low stakes) to prohibited use (unacceptable risk). Each tier has different requirements for human oversight, audit logging, and fallback behavior. It ensures that AI autonomy is proportional to the consequences of errors.

How do I decide if my AI feature needs human review?

Ask four questions: Is the output reversible? How many people does an error affect? Are affected people vulnerable or unable to contest the decision? Could an error create legal liability? If any answer raises concern, add a human-in-the-loop checkpoint before the AI output reaches its final destination.

What is a circuit breaker pattern for AI systems?

A circuit breaker monitors AI system failures and automatically disables AI operations when errors exceed a threshold. It prevents cascading failures by falling back to non-AI behavior, then gradually re-enables AI operations once the underlying issue is resolved. It's borrowed from electrical engineering and microservice architecture.

Do I need an AI governance policy for a small project?

If your AI feature affects real users — yes. The scale of your policy should match the scale of your risk, but even a solo developer shipping an AI feature should have a written list of what the AI can and cannot do, a kill switch to disable it, and basic logging of AI decisions. A lightweight governance doc takes an hour and can save you from serious liability.

Tutorials

Anthropic Doesn't Trust the Pentagon, and Neither Should You

Anthropic won't let the Pentagon use Claude without strict guardrails — and that tells us everything about how to deploy AI responsibly. This tutorial gives you a practical governance framework, complete with code examples, to implement the same trust hierarchy in your own projects.

March 21, 2026

What Anthropic's Military AI Guardrails Teach Us About Building an AI Governance Framework

In late 2024, Anthropic made headlines by partnering with defense contractors like Palantir and Amazon Web Services to offer Claude to U.S. intelligence agencies — but with significant restrictions that most people overlooked. The company didn't just hand over its models and walk away. It imposed strict usage policies, contractual guardrails, and red lines that even the Department of Defense must respect. The result is the most instructive AI governance framework any company has published to date.

That restraint tells us something important: if the company that built the AI doesn't trust unrestricted military deployment, you shouldn't trust unrestricted deployment in your own organization either.

This tutorial breaks down Anthropic's approach to high-stakes AI governance and shows you how to apply the same principles — whether you're a startup founder, an enterprise architect, or a solo developer shipping AI-powered features.

ℹ️ What This Tutorial Covers

This isn't an anti-military or anti-AI polemic. It's a practical guide to implementing the same safety-first deployment patterns that Anthropic uses when the stakes are highest. You'll walk away with a governance framework you can adapt to your own AI projects.

What Anthropic Actually Restricts (And Why)

This part's important — anthropic's Acceptable Use Policy (AUP), combined with its public statements and contractual terms, establishes clear principles for how Claude can be used in defense contexts. Here's what the company has committed to:

The Red Lines

No autonomous weapons systems — Claude can't be used to independently select and engage targets without meaningful human oversight
No mass surveillance — Bulk processing of civilian communications or biometric data for population-level monitoring is prohibited
No decisions about lethal force — Claude can summarize intelligence briefings, but it can't make or directly recommend kill/no-kill decisions
No circumventing human review — Outputs that feed into high-consequence decisions require a human-in-the-loop at every stage

What IS Permitted

Logistics and supply chain optimization
Translation and language analysis of foreign-language documents
Summarization of open-source intelligence (OSINT)
Cybersecurity threat detection and analysis
Administrative and back-office automation

⚠️ The Key Insight

Anthropic doesn't ban military use entirely — it bans unsupervised, high-consequence decision-making. That distinction is the foundation of every governance framework in this tutorial.

The Trust Hierarchy: A Framework You Can Steal

Anthropic's approach reveals an implicit trust hierarchy that maps cleanly to any organization deploying AI. Think of it as concentric rings of trust:

Ring 1: Full Automation (Low Stakes)

Tasks where AI errors are cheap to fix and reversible.

Pentagon example: Auto-generating meeting summaries from unclassified briefings.

Your example: Drafting marketing copy, auto-labeling support tickets, summarizing internal documents.

TypeScript

// Ring 1: Fire-and-forget automation
const summary = await claude.generate({
 prompt: `Summarize this meeting transcript: ${transcript}`,
 // No human review needed — worst case, someone corrects a bad summary
});
await saveToDatabase(summary);

Ring 2: Human-in-the-Loop (Medium Stakes)

Anthropic's trust hierarchy applied to AI governance — human-in-the-loop for medium-stakes decisions

Tasks where AI errors could cause meaningful harm but a human reviewer can catch mistakes before they propagate.

Pentagon example: Translating intercepted foreign-language communications for analyst review.

Your example: Generating customer-facing emails, producing financial report drafts, writing code that will be reviewed before merge.

TypeScript

// Ring 2: Generate-then-review
const draft = await claude.generate({
 prompt: `Draft a response to this customer complaint: ${complaint}`,
});

// Queue for human review — never auto-send
await reviewQueue.add({
 draft,
 requiredApprovals: 1,
 escalateAfter: '4h',
 context: { originalComplaint: complaint },
});

Ring 3: AI-Assisted Only (High Stakes)

Tasks where the AI provides information but never makes or recommends the decision.

Pentagon example: Presenting satellite imagery analysis to a human analyst, who independently decides what it means.

Your example: Flagging potential fraud for a human investigator, suggesting medical diagnoses for a doctor to evaluate, identifying legal risks for an attorney to assess.

TypeScript

// Ring 3: Inform, never decide
const analysis = await claude.generate({
 prompt: `Analyze this transaction for potential fraud indicators: ${JSON.stringify(transaction)}`,
 systemPrompt: `You are a fraud analysis assistant. Present findings as observations, 
 never as conclusions. Always note confidence levels and alternative explanations. 
 End every response with: "This analysis requires human review before any action is taken."`
});

await fraudReviewDashboard.createCase({
 analysis,
 transaction,
 status: 'pending_human_review',
 autoAction: 'none', // The model must not trigger holds or blocks
});

Ring 4: Prohibited (Unacceptable Risk)

Tasks where no amount of oversight makes AI involvement acceptable.

Pentagon example: Autonomous target selection.

Your example: Automated hiring/firing decisions without human review, autonomous medical treatment, unsupervised decisions about user account termination.

🚨 Ring 4 Is Non-Negotiable

Every organization needs an explicit list of things AI will never do, regardless of how good the model gets. If you don't have this list, make one before you ship anything.

Building Your Own Acceptable Use Policy

This caught our eye. here's a step-by-step process for creating an AI governance policy modeled on Anthropic's approach:

Step 1: Inventory Every AI Touchpoint

List every place in your product or organization where AI makes or influences a decision. Be exhaustive.

Touchpoint	Input	Output	Who Sees It	Consequence of Error
Support ticket routing	Customer message	Category label	Internal team	Delayed response
Content moderation	User post	Allow/flag/remove	End users	Censorship or harm
Credit scoring assist	Financial data	Risk assessment	Loan officers	Denied credit
Code generation	Developer prompt	Code suggestion	Developers	Security vulnerability

Step 2: Classify Each Touchpoint by Risk Ring

Using the trust hierarchy above, assign each touchpoint to Ring 1–4. When in doubt, move it up one ring (more restrictive).

Decision criteria:

Reversibility — Can you undo the action if the AI is wrong?
Blast radius — How many people are affected by an error?
Vulnerability of subjects — Are the people affected able to advocate for themselves?
Legal exposure — Could an error trigger regulatory or legal consequences?

Step 3: Implement Ring-Appropriate Controls

For each ring, define the minimum controls:

TypeScript

interface GovernanceControls {
 ring: 1 | 2 | 3 | 4;
 humanReviewRequired: boolean;
 humanApprovalRequired: boolean;
 auditLogging: boolean;
 outputFiltering: boolean;
 fallbackBehavior: 'proceed' | 'queue' | 'block';
 reviewSLA: string | null; // e.g., '4h', '24h'
 escalationPath: string | null; // e.g., 'team-lead > director'
}

const ringDefaults: Record<number, GovernanceControls> = {
 1: {
 ring: 1,
 humanReviewRequired: false,
 humanApprovalRequired: false,
 auditLogging: true,
 outputFiltering: true,
 fallbackBehavior: 'proceed',
 reviewSLA: null,
 escalationPath: null,
 },
 2: {
 ring: 2,
 humanReviewRequired: true,
 humanApprovalRequired: false,
 auditLogging: true,
 outputFiltering: true,
 fallbackBehavior: 'queue',
 reviewSLA: '4h',
 escalationPath: 'team-lead',
 },
 3: {
 ring: 3,
 humanReviewRequired: true,
 humanApprovalRequired: true,
 auditLogging: true,
 outputFiltering: true,
 fallbackBehavior: 'block',
 reviewSLA: '1h',
 escalationPath: 'team-lead > director',
 },
 4: {
 ring: 4,
 humanReviewRequired: true,
 humanApprovalRequired: true,
 auditLogging: true,
 outputFiltering: true,
 fallbackBehavior: 'block',
 reviewSLA: 'immediate',
 escalationPath: 'director > legal',
 },
};

Step 4: Build Circuit Breakers

Anthropic can revoke Pentagon access if their red lines are crossed. You need the same capability.

TypeScript

class AICircuitBreaker {
 private failureCount = 0;
 private readonly threshold: number;
 private readonly resetInterval: number;
 private state: 'closed' | 'open' | 'half-open' = 'closed';

 constructor(threshold = 5, resetIntervalMs = 60000) {
 this.threshold = threshold;
 this.resetInterval = resetIntervalMs;
 }

 async execute<T>(operation: () => Promise<T>): Promise<T> {
 if (this.state === 'open') {
 throw new Error('Circuit breaker is open — AI operations suspended');
 }

 try {
 const result = await operation();
 this.onSuccess();
 return result;
 } catch (error) {
 this.onFailure();
 throw error;
 }
 }

 private onSuccess() {
 this.failureCount = 0;
 this.state = 'closed';
 }

 private onFailure() {
 this.failureCount++;
 if (this.failureCount >= this.threshold) {
 this.state = 'open';
 setTimeout(() => {
 this.state = 'half-open';
 }, this.resetInterval);
 }
 }
}

Step 5: Log Everything, Review Regularly

Every AI decision — even Ring 1 — should be logged in a way that supports after-the-fact auditing.

TypeScript

interface AIDecisionLog {
 id: string;
 timestamp: Date;
 ring: number;
 touchpoint: string;
 input: string; // What was sent to the model
 output: string; // What came back
 humanReviewed: boolean;
 humanOverridden: boolean;
 finalOutcome: string; // What actually happened
 latencyMs: number;
 model: string;
 cost: number;
}

Schedule monthly reviews of Ring 2+ decisions. The NIST AI Risk Management Framework provides additional guidance on structuring these reviews. Look for patterns: Is the AI consistently wrong about certain inputs? Are humans rubber-stamping reviews without actually reading them? Are Ring 3 outputs being treated as Ring 1 in practice?

The Anthropic Lesson: Trust Is Graduated, Not Binary

The single most important takeaway from Anthropic's Pentagon policy is this: trust isn't a binary switch. You don't either trust AI or distrust it. You trust it proportionally to the stakes and your ability to catch errors.

Anthropic trusts Claude documents. It doesn't trust Claude to choose bombing targets. The gradient between those two extremes is where your governance policy lives.

💡 The Litmus Test

For every AI feature you ship, ask: "If this output is completely wrong, what's the worst thing that happens?" If you can't live with the answer, add a human checkpoint.

Common Mistakes to Avoid

Mistake 1: Treating AI Governance as a One-Time Exercise

Models change. Capabilities improve. Your risk surface shifts every time you update a model or expand a use case. Revisit your ring classifications quarterly.

Mistake 2: Confusing Speed with Value

The whole point of AI is speed, right? Not when speed creates liability. A Ring 3 process that takes 24 hours with human review is infinitely more valuable than a Ring 3 process that takes 2 seconds and produces a lawsuit.

Mistake 3: No Kill Switch

If you can't disable your AI features in under 5 minutes with a single configuration change, you've built a system you can't govern. Feature flags exist for this reason.

TypeScript

// Every AI feature should respect a kill switch
const isAIEnabled = await featureFlags.isEnabled('ai-features', {
 fallback: false, // Default to OFF if flag service is unreachable
});

if (!isAIEnabled) {
 return fallbackBehavior(input);
}

Mistake 4: Assuming the Model Is the Risk

The model is rarely the biggest risk. The biggest risks are:

Prompt injection from untrusted user input (see how OpenAI caught coding agents bypassing security)
Stale or wrong context fed into the model
Human over-reliance on AI outputs (automation bias)
Scope creep where low-risk features gradually absorb high-risk decisions

Your Governance Checklist

Before shipping any AI-powered feature, verify:

Every AI touchpoint is inventoried and classified by risk ring
Ring 2+ features have human review workflows implemented
Ring 4 prohibitions are documented and technically enforced (not just policy)
Circuit breakers can halt AI operations within minutes
All AI inputs and outputs are logged for audit
A quarterly review cadence is scheduled
Fallback behavior is defined for when AI is unavailable
Prompt injection defenses are in place for any user-facing AI feature
Your team knows the difference between "AI-assisted" and "AI-decided"

Wrapping Up

Anthropic built Claude. They understand its capabilities and limitations better than anyone on Earth. And they still won't let the Pentagon use it without guardrails, monitoring, and hard limits.

If that level of caution is appropriate for the people who made the model, it's the bare minimum for the rest of us.

The trust hierarchy framework in this tutorial isn't theoretical — it's a formalized version of what responsible AI companies are already doing behind closed doors — as documented in Anthropic's own Responsible Scaling Policy. Now you can do it too, without waiting for regulation to force your hand.

The question isn't whether you trust AI. It's whether you've built the systems to verify that trust at every level of risk.

Sources