Arena (formerly LMArena)

3 releases covered on ThursdAI · lmarena.ai ↗

June 2026

Arena (LMArena)
Benchmarks & Evals

Agent Arena

Arena launches Agent Arena for real-world agent workflow evals

Arena (LMArena) launched Agent Arena during the episode, moving beyond one-turn chatbot preference battles to evaluate models on real agent workflows with web search, files, terminals, user corrections, and objective recovery signals. Peter Gostev joined live to explain why long-running, harder tasks need a different benchmark.

April 2026

DatasetsOpen weights

Arena historical leaderboard & prompt datasets

Arena releases 3 years of leaderboard data and prompts on Hugging Face

Arena (formerly LMArena) released three years of historical leaderboard data plus the actual user prompts as datasets on Hugging Face. Peter Gostev, who previously scraped the site by hand into Google Sheets for his charts, now builds his Compute Wars and model-trend analyses straight from the data.

November 2025