Benchmarks & Evals
Coding Agent Index
Artificial Analysis Coding Agent Index benchmarks model + harness combos
Artificial Analysis launched the Coding Agent Index, a benchmark that evaluates model and harness combinations rather than models alone. Opus 4.7 in Cursor CLI leads at 61, GLM-5.1 tops the open-weight entries at 53, and costs vary 30x across combos for similar capability.