Shadman Ahmed

Software Architect

Software architect and AI tools enthusiast. I test, benchmark, and review AI models and developer tools so you don't have to.

150

Articles

76,086

Total Views

266K

Words Written

All Articles (150 total)

Mistral Medium 3.5 vs 3: 7 Real Upgrades That Matter

A no-fluff breakdown of what actually changed between Mistral Medium 3.5 and Medium 3, from reasoning gains to pricing shifts, and which one you should pick.

July 29, 2026 8 min 33comparisons

GPT-5.6 Luna vs GPT-5 mini: 7 Upgrades That Matter

OpenAI's new small tier lands with a 1M token context and improved tool calling. Is the upgrade from GPT-5 mini worth it? A data-driven breakdown of what actually changed.

July 27, 2026 10 min 62comparisons

AI Data Center Grid Resilience: A 7-Step Fix Guide

One fallen power line in Virginia knocked 3.1 GW of AI load off the grid in seconds. This tutorial walks through how operators can actually fix it.

July 26, 2026 9 min 58tutorials

GPT-5.6 Sol vs Claude Fable 5: The 2026 Coding Verdict

Claude Fable 5 posts a self-reported 95.5% SWE-bench score while GPT-5.6 Sol pushes reasoning further. So which model actually ships better code in 2026? A data-driven breakdown.

July 22, 2026 9 min 68comparisons

LLM Agents Flop at Coordination: Inside the ALEM Benchmark

A new open-ended coordination benchmark tests 13 LLMs across communication, trading, crafting, and combat. Most agents average just 6% normalised return.

July 19, 2026 8 min 105benchmarks

Apple SpeechAnalyzer vs Whisper: Benchmark Verdict

Apple's new SpeechAnalyzer API landed in iOS 26 with big claims. Benchmark data from Inscribe puts it head-to-head with Whisper and the old SFSpeechRecognizer. The results are surprising.

July 18, 2026 7 min 71benchmarks

Train a Kick Drum AI Model on 6GB VRAM: Full Linux Guide

A dusty GTX 1660 and a weekend are all you need. This tutorial walks through training a working kick drum diffusion model on 6GB of VRAM, from dataset prep to first generated sample.

July 17, 2026 8 min 78tutorials

Qwen 3.7 Plus vs 3.6 Plus: 7 Real Upgrades in 2026

A no-fluff breakdown of what actually changed between Qwen 3.7 Plus and Qwen 3.6 Plus, from reasoning gains to pricing shifts and coding wins.

July 14, 2026 8 min 90comparisons

Stop Google Training AI on You: 7 Settings to Fix Now

Google quietly expanded which of your data can train Gemini. Walk through the 7 exact toggles that pull your account back out of the training pool.

July 12, 2026 7 min 96tutorials

DeepSeek V4 Pro vs V3: 7 Upgrades That Matter

DeepSeek V4 Pro replaces V3 with 1M-token context, a 1.6T-parameter MoE, and native reasoning modes. Here's which upgrades matter — and where V3 still wins on cost.

July 11, 2026 9 min 91comparisons

Talos-XII: Hand-Written Rust Autograd Hits 10k Sims/Sec

A solo-built Rust autograd stack with custom SIMD dispatch models gacha probabilities at 10k+ sims per second. Here's what the benchmarks reveal about tiny-model performance without PyTorch.

July 10, 2026 7 min 87benchmarks

7 Open-Source Claude Desktop Alternatives Worth Trying

Rowboat, LibreChat, Open WebUI, Jan, and more: seven serious open-source Claude Desktop alternatives ranked for 2026, with honest takes on each.

July 8, 2026 8 min 69listicles

Page 1 of 13Next