Shadman Ahmed
Software Architect
Software architect and AI tools enthusiast. I test, benchmark, and review AI models and developer tools so you don't have to.
84
Articles
20,784
Total Views
149K
Words Written
All Articles (84 total)
Krasis vs llama.cpp: Is 10x Faster LLM Inference Real?
Krasis LLM Runtime claims dramatically faster inference than llama.cpp for large MoE models on a single NVIDIA GPU. We break down the real numbers, the retracted benchmarks, and when each tool wins.
A $500 GPU Just Beat Claude Sonnet at Coding Tasks
ATLAS, a source-available AI system built by a Virginia Tech student, scores 74.6% on LiveCodeBench using a single $500 consumer GPU — outperforming Claude Sonnet's 71.4% at roughly $0.004 per task.
Google Opens Lyria 3 API: AI Music for 4 Cents a Track
Google Lyria 3 is now available to developers through the Gemini API at $0.04 per 30-second clip. Here's what you get, what's missing, and how it stacks up against Suno and Udio.
ChatGPT Becomes a Shopping Mall: 7 Retailers Already In
OpenAI just turned ChatGPT into a visual shopping assistant with product comparisons, image search, and feeds from Target, Sephora, Best Buy, and more — all powered by the Agentic Commerce Protocol.
Clarity-OMR vs Audiveris: 5 OMR Accuracy Tests
A deep-dive comparison of Clarity-OMR's machine learning approach against Audiveris's traditional computer vision for optical music recognition — with real benchmark data on 10 classical piano pieces.
5 Ways OpenAI Protects Sora 2 Users — And 3 Gaps
OpenAI details its five-layer safety system for Sora 2, including C2PA metadata, CSAM detection, and teen protections. But real-world testing reveals stubborn blind spots that watermarks and classifiers can't fix.
Grammarly AI Cloned 100+ Writers — A $5M Lawsuit and an Apology
Superhuman's CEO sat for a Decoder interview with The Verge's editor — one of the writers Grammarly's AI cloned without permission. It got tense.
ROCm 7 vs Vulkan on Mi50: 4-Model Benchmark Results
New benchmarks pit ROCm 7 nightly against Vulkan on an AMD Mi50 32GB running llama.cpp. Vulkan wins short-context dense inference, but ROCm dominates everything else — with a stability catch.
CRYSTAL Benchmark Exposes How AI Models Fake Reasoning
A new benchmark tested 20 multimodal AI models and found 19 of them cherry-pick reasoning steps while skipping actual thinking. The gap between accuracy and reasoning quality is alarming.
OpenAI Buys Astral: 5 Things Python Devs Must Know
OpenAI is acquiring Astral, the company behind uv and Ruff, to supercharge Codex. Here's what it means for the Python ecosystem, open source, and the AI coding wars.
Anthropic Doesn't Trust the Pentagon, and Neither Should You
Anthropic won't let the Pentagon use Claude without strict guardrails — and that tells us everything about how to deploy AI responsibly. This tutorial gives you a practical governance framework, complete with code examples, to implement the same trust hierarchy in your own projects.
Project Genie Prompts: 4 Tips to Build Better Worlds
Google DeepMind's Project Genie lets you generate interactive worlds from text. Here are 4 proven tips for writing prompts that produce stunning, explorable environments.