Claude Opus 4.8: Tops SWE-Bench Pro at 69.2%
Claude Opus 4.8 hits 69.2% on SWE-Bench Pro for agentic coding lead, adds self-doubt honesty while trailing GPT-5.5 on Terminal-Bench 2.1.
SourceAnalysis
Claude Opus 4.8 posts 69.2% on SWE-Bench Pro, extending its lead in agentic coding benchmarks, yet still trails GPT-5.5 on Terminal-Bench 2.1 at 2.1. The model now admits uncertainty on select tasks, a shift from prior versions that rarely flagged their own limits. Released at unchanged pricing, the update arrives alongside EasyRouterIO launch offering 400 free test credits via promo code.
傅盛
@FuSheng_0306Chairman and CEO of Cheetah Mobile, Chairman of OrionStar