Center for AI Safety & Scale AI Releases: Timeline of Every Launch

Center for AI Safety & Scale AI Jan 23, 2025

Benchmarks & Evals

Humanity's Last Exam (HLE)

Humanity's Last Exam: a deliberately unsaturated frontier benchmark

Humanity's Last Exam (HLE) launched as a new, very hard benchmark designed to stay unsaturated as models max out MMLU and math evals. It crowdsourced expert-level questions to measure frontier model capability where existing benchmarks are at 98-99% saturation.

Humanity's Last Exam website ↗

🎙️ Hear our coverage →

#benchmarks #reasoning

Center for AI Safety & Scale AI

January 2025

Humanity's Last Exam (HLE)

Get this every week