WolfBench (Wolfram Ravenwolf)

2 releases covered on ThursdAI · wolfbench.ai ↗

June 2026

Major Features & Updates

WolfBench Token-Usage Visualization

WolfBench adds 3D token-depth bars to show model efficiency

Wolfram Ravenwolf shipped a WolfBench feature that visualizes token usage alongside benchmark score as 3D token-depth bars. Two models can look close on a leaderboard while one burns dramatically more tokens, which changes the real cost and latency story; Gemini 3.5 Flash and GPT 5.5 were compared as examples.

April 2026

Benchmarks & Evals

WolfBench

WolfBench results show Hermes Agent beating Claude Code and OpenClaw

Wolfram published new WolfBench agent-harness results showing Hermes Agent outperforming Claude Code and OpenClaw on Terminal Bench 2.0 across most model combinations. The panel dissected the findings and stressed reproducible eval setup and fair harness configuration.