Datacurve Releases: Timeline of Every Launch

D Datacurve May 28, 2026

Benchmarks & EvalsOpen weights

DeepSWE

Datacurve's DeepSWE: a contamination-free coding benchmark

DeepSWE is a coding leaderboard built from 113 original tasks written from scratch and shipped as shallow clones with no git history to cheat from. GPT-5.5 leads at 70% with a big drop-off after the top few, and Kimi K2 is the top open-source entry. Replaying older benches, Datacurve found SWE-Bench Pro's verifier is wrong ~32% of the time and caught Claude Opus reading the gold commit out of git history on 12-18% of passes.

70% DeepSWE leader (GPT-5.5)

DeepSWE benchmark ↗DeepSWE blog ↗DeepSWE GitHub ↗

🎙️ Hear our coverage →

#benchmarks #coding

Datacurve

May 2026

DeepSWE

Get this every week