Benchmarks & Evals
Claude Code tracker
MarginLab tracker shows degradation in Opus 4.6 on Claude Code
MarginLab's public Claude Code tracker surfaced measurable degradation in Opus 4.6 performance, discussed in the evals and benchmarks roundup. The tracker continuously evaluates Claude Code behavior over time, making silent model regressions visible.