Laude Institute / Stanford Releases: Timeline of Every Launch

L Laude Institute / Stanford Nov 13, 2025

Benchmarks & EvalsOpen weights

Terminal-Bench 2.0

Terminal-Bench 2.0 and Harbor launch as new bar for coding agents

Terminal-Bench 2.0 launched alongside the Harbor framework, with 89 hard, realistic terminal-based tasks built with around 1000 Discord contributors. The Warp agent tops the leaderboard at 50% with Codex CLI close behind, and the panel argued an unsaturated 50% ceiling makes it far more meaningful than near-saturated benchmarks like MMLU.

50% Terminal Bench v2 Top Score

Announcement on X ↗Harbor framework ↗Running Terminal-Bench docs ↗Terminal-Bench leaderboard ↗

🎙️ Hear our coverage →

#benchmarks #agents #coding

Laude Institute / Stanford

November 2025

Terminal-Bench 2.0

Get this every week