Benchmarks
AI model benchmark results and analysis — 16 articles

AI Benchmarks Are Broken — This Book Explains Why
A new book by Moritz Hardt argues that benchmark rankings — not scores — are what actually matter. We tested his thesis...
March 25, 20269 min

A $500 GPU Just Beat Claude Sonnet at Coding Tasks
ATLAS, a source-available AI system built by a Virginia Tech student, scores 74.6% on LiveCodeBench using a single $500...
March 25, 20268 min

CRYSTAL Benchmark Exposes How AI Models Fake Reasoning
A new benchmark tested 20 multimodal AI models and found 19 of them cherry-pick reasoning steps while skipping actual...
March 22, 20268 min

NousCoder-14B vs Claude Code: Open-Source Coding Model Benchmark Showdown
Nous Research's NousCoder-14B benchmark score hits 67.87% on LiveCodeBench v6 — beating every open-source rival at its...
March 17, 20268 min
PreviousPage 2 of 2