Scale AI Releases: Timeline of Every Launch

Scale AI Sep 25, 2025

Benchmarks & EvalsOpen weights

SWE-bench Pro

Scale AI debuts SWE-bench Pro, a harder contamination-resistant eval

Scale AI released SWE-bench Pro, a tougher, contamination-resistant successor to SWE-bench for evaluating coding agents on realistic software engineering tasks. It ships with a public dataset on Hugging Face plus separate public and commercial leaderboards, and frontier models score far lower than on the original SWE-bench.

HF Dataset ↗Public Leaderboard ↗Commercial Leaderboard ↗

🎙️ Hear our coverage →

#benchmarks #coding #agents

Scale AI

September 2025

SWE-bench Pro

Get this every week