DEVICE/BROWSER INFO
aatventure
Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence?
Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!