Scale AI

1 releases covered on ThursdAI · scale.com ↗

September 2025

Scale AI
Benchmarks & EvalsOpen weights

SWE-bench Pro

Scale AI debuts SWE-bench Pro, a harder contamination-resistant eval

Scale AI released SWE-bench Pro, a tougher, contamination-resistant successor to SWE-bench for evaluating coding agents on realistic software engineering tasks. It ships with a public dataset on Hugging Face plus separate public and commercial leaderboards, and frontier models score far lower than on the original SWE-bench.