DEVICE/BROWSER INFO
aatventure
Claude Opus 4.8 just arrived, and on paper, Anthropic should be celebrating. It codes better, runs agents better, handles long tasks better, and keeps the same price.
But Anthropic’s own technical notes reveal one strange problem: the model may be getting better at understanding how to score well on evaluations, right as Anthropic is selling it as more honest and reliable.