Benchmarks & EvalsOpen weights
SWE-bench Pro
Scale AI debuts SWE-bench Pro, a harder contamination-resistant eval
Scale AI released SWE-bench Pro, a tougher, contamination-resistant successor to SWE-bench for evaluating coding agents on realistic software engineering tasks. It ships with a public dataset on Hugging Face plus separate public and commercial leaderboards, and frontier models score far lower than on the original SWE-bench.