Is Grok 4.20 backward compatible with Grok 4.3 API calls?

Mostly yes. Existing chat completion requests generally work without changes. The Grok 4.20 model family exposes separate reasoning and non-reasoning variants (for example, `grok-4.20-reasoning` and `grok-4.20-non-reasoning`), so if you were toggling reasoning behavior on 4.3, you may want to select the appropriate 4.20 variant at the model level. Always confirm parameter compatibility against the current [xAI docs](https://docs.x.ai/) before migrating.

Can I use Grok 4.20 through OpenRouter or Poe instead of the xAI API directly?

Yes, both OpenRouter and Poe added Grok 4.20 within weeks of launch. OpenRouter tends to charge a small markup over xAI direct rates but gives you unified billing across models. Poe is better for individual consumer use, not production API traffic.

Does Grok 4.20 support fine-tuning or custom training?

Not publicly as of July 2026. xAI has hinted at a fine-tuning API in their roadmap but has not shipped it. If you need custom model behavior, you're limited to prompt engineering and RAG. Anthropic and OpenAI both offer more mature customization paths.

What happens to Grok 4.3 after 4.20 launched?

Grok 4.3 remains in active support with no announced deprecation date, but xAI historically deprecates older models within 12 to 18 months of a major successor. Plan for a migration window sometime in 2027 if you're building new infrastructure on 4.3 today.

How does Grok 4.20 handle rate limits compared to 4.3?

Rate limits vary by tier and by whether you're using the reasoning or non-reasoning variant. The xAI docs list base tier RPM allowances directly on the [models page](https://docs.x.ai/docs/models). If you're planning burst traffic, check the current limits for your specific model alias and consider requesting a limit increase before migrating production workloads.

Grok 4.3 vs Grok 4.20: 5 Real Differences That Matter

xAI has a naming problem. Grok 4.20 isn't a minor patch on 4.3, and the version number makes that genuinely confusing for anyone building on the API. So let's clear it up.

The short answer: Grok 4.20 iterates on Grok 4.3 with a rebuilt agentic tool loop and adaptive reasoning behavior. Both models list the same 1 million token context window in the xAI docs and are priced identically at the base tier, so this is not a raw capability jump — it is a behavior and reasoning-stack refinement.

This Grok 4.3 vs Grok 4.20 comparison walks through what actually changed under the hood, where the benchmarks moved, and which one you should be pointing your production traffic at as of mid-2026.

Quick verdict: which Grok should you use?

If you're building fresh on xAI and choosing between the two, start on Grok 4.20 — the xAI docs treat it as the newer entry alongside Grok 4.3. If you're already on 4.3, there's no urgent migration pressure: both models share the same 1M context window and the same base pricing.

Bar chart comparing Grok 4.3 and Grok 4.20 base pricing and context window from xAI docs

If you run a high-volume chatbot with short turns, or you're on a tight latency budget for the first-token response, Grok 4.3 is still a reasonable pick. Both models share the same base pricing per the xAI docs, so the choice comes down to behavior rather than cost.

And if you're just kicking the tires on xAI for the first time, either model is a reasonable starting point since the base tier is identical.

At-a-glance comparison

Feature	Grok 4.3	Grok 4.20
Context window	1M tokens	1M tokens
Reasoning mode	Configurable	Reasoning and non-reasoning variants
Tool calling	Function calling, structured outputs	Function calling, structured outputs, multi-agent variant
Vision	Yes	Yes
Real-time X data	Yes (server-side search tools)	Yes (server-side search tools)
API pricing (input)	$1.25 / 1M tokens	$1.25 / 1M tokens
API pricing (output)	$2.50 / 1M tokens	$2.50 / 1M tokens
Chatbot Arena Elo	N/A (independent ranking not confirmed)	N/A (independent ranking not confirmed)

Pricing figures reflect the xAI models documentation at time of writing. Always check current pricing before locking in a contract.

What actually changed under the hood

1. Context window: both list 1M tokens

Both Grok 4.3 and Grok 4.20 list a 1 million token maximum prompt length in the xAI docs, so context length is not a differentiator between them. That already puts either model ahead of GPT-4o's 128K ceiling.

As with any long-context model, real-world recall past a few hundred thousand tokens depends on the workload. xAI has not published independent needle-in-a-haystack numbers for either model at the time of writing, so if you rely on retrieval from deep context, benchmark it on your own data before committing.

2. Reasoning is no longer a toggle

Grok 4.3 exposes a configurable reasoning setting so you can trade latency for depth on demand. Grok 4.20 ships as two distinct variants in the model list — a reasoning variant and a non-reasoning variant — so you pick the behavior at the request level rather than toggling a mode.

The tradeoff: if your traffic mixes simple lookups and hard planning tasks, you may end up routing between the two 4.20 variants, whereas 4.3 lets one endpoint handle both.

3. The tool-calling rewrite

Anyone who tried building agents on Grok 4.3 knows the pain. Tool calls worked, but multi-turn tool loops (where the model calls a tool, reads the result, then decides what to call next) were flaky. About one in five runs would either hallucinate a function that didn't exist or loop on the same call.

4.20 rebuilt this. xAI describes the model as tuned for agentic tool calling with reduced hallucinations, though the company has not published independently verified chained-call accuracy numbers. If your agent chains many tool calls, evaluate 4.20 against Claude Opus 4.7 and current OpenAI models on your own workload.

4. Vision upgrades (that are actually noticeable)

Both models accept image input alongside text, per the xAI docs. xAI has not published side-by-side vision benchmarks between 4.3 and 4.20 that we could independently verify, so if OCR quality or chart interpretation matters to your workload, run your own comparison before switching. For rough context on where the frontier sits, Gemini is worth benchmarking against.

If your workload involves parsing screenshots, invoices, or scientific figures, benchmark both models on representative samples before committing.

5. Real-time X (formerly Twitter) integration

The headline feature xAI keeps advertising. Both models can access live X posts via server-side search tools, per the xAI documentation. Concrete latency comparisons between 4.3 and 4.20 have not been published, so measure your own workload if this matters.

This is still the main reason to pick Grok over the alternatives. Nobody else has legal, licensed access to the full X firehose in real time.

Pricing

xAI lists both Grok 4.3 and Grok 4.20 at the same base rate in the models documentation.

Tier	Grok 4.3	Grok 4.20
Input tokens	$1.25 per million	$1.25 per million
Output tokens	$2.50 per million	$2.50 per million
Vision	Same as text	Same as text

Base pricing is identical, so cost is not a reason to prefer one over the other. Long-context requests over the 200K threshold are priced separately at a higher rate — verify the current xAI pricing for your specific traffic mix before committing.

And yes, both models require a paid subscription. There's no free tier for the API, unlike DeepSeek or the free tier on some Google models.

Benchmark comparison

Things get honest — xAI hasn't published a full independent benchmark suite for Grok 4.20 at the time of writing, so we're working with the vendor's own materials and early community testing. Take internal benchmarks with skepticism, always.

Grok 4.3 and Grok 4.20 will each need independent LMSYS Chatbot Arena rankings before anyone can honestly compare them to Claude and GPT frontier models. Until then, treat any single-source benchmark score as marketing rather than data.

Two engineers discussing system architecture at a whiteboard with hand-drawn diagrams

On coding, xAI has not published verified SWE-bench Verified numbers for Grok 4.20 that we could confirm against the official SWE-bench leaderboard. Community estimates circulate, but until an independent submission lands, any specific coding benchmark score for 4.20 is best treated as unverified.

My honest read: Grok 4.20 looks like a solid refinement of 4.3 rather than a frontier leap. If you're already in the xAI ecosystem, or you need the X data integration, it's worth trying. If you're comparison shopping for the hardest coding or reasoning work, run your own evals against Claude and GPT before committing.

When to choose each model

Choose Grok 4.3 if:

You run high-volume, short-turn chat where first-token latency matters more than absolute quality
Your workload is heavily input-biased (long context in, short answer out)
You've built extensive prompt infrastructure around 4.3 and migration risk is real
You don't need multi-turn tool calling

Choose Grok 4.20 if:

You're building an agent that chains tool calls and want the reasoning/non-reasoning split
You do screenshot-to-code, invoice parsing, or any vision-heavy workflow
You want real-time X integration
You're starting fresh on xAI (no reason to build new on 4.3)

What xAI still hasn't fixed

A fair comparison has to name what's still broken. Grok as a family still has weaker instruction-following on complex system prompts than Claude does. The API SDK is thinner than what OpenAI ships. Documentation, while improved, still lags behind Anthropic's. And the rate limits on lower-tier accounts feel stingy compared to what you get on OpenRouter with the same spend.

None of this is a dealbreaker. But if you're picking a first model for a serious production build, Claude or GPT still get the safer nod. Grok is the interesting option when you need something they can't do, and X data access is basically the only thing in that category right now.

For a broader look at how Grok stacks up against Anthropic's reasoning model, see our Grok 4.3 vs Claude Fable 5 comparison. Comparing across ecosystems? Our GPT vs Claude Opus 4.6 showdown covers the two other frontier options you should be evaluating alongside Grok.

Final verdict

Grok 4.20 is xAI's newer entry, but at the base tier it shares the 1M context window and the $1.25/$2.50 pricing of Grok 4.3. The differences are in reasoning behavior and tool-calling tuning, not a raw capability jump. If you're on 4.3 today and it works, there is no urgent reason to migrate.

But if you're using Grok as a fast, cheap chat backend and it's working fine, don't fix what isn't broken. 4.3 is still supported and still fast.

And if you're comparison shopping across the whole model space, Grok isn't the top-tier frontier choice for coding or reasoning. It's a solid second-tier model with one genuinely unique feature (X data). For most teams, that's enough to keep it in the rotation. Not enough to make it the default.

Sources