Benchmarks

AI model benchmark results and analysis — 16 articles

AI Benchmarks Are Broken — This Book Explains Why

A new book by Moritz Hardt argues that benchmark rankings — not scores — are what actually matter. We tested his thesis...

ATLAS, a source-available AI system built by a Virginia Tech student, scores 74.6% on LiveCodeBench using a single $500...

A new benchmark tested 20 multimodal AI models and found 19 of them cherry-pick reasoning steps while skipping actual...

Nous Research's NousCoder-14B benchmark score hits 67.87% on LiveCodeBench v6 — beating every open-source rival at its...

PreviousPage 2 of 2