Weights & Biases

22 releases covered on ThursdAI · wandb.ai ↗

May 2026

Weights & Biases
Dev Tools

W&B MCP Server

Weights & Biases launches MCP server with 20 tools for agents

W&B officially launched its MCP server with 20 schema-first tools so coding agents can read experiments, monitor training, and run autonomous research loops. Agents can query metadata before pulling full 300-metric runs, keeping their context windows from blowing up.

April 2026

Weights & Biases
Major Features & Updates

W&B LEET Workspace Mode

W&B LEET TUI ships workspace mode with multi-run compare and GPU metrics

Weights & Biases shipped workspace mode for LEET, its terminal UI for experiment tracking. The update brings multi-run comparisons, live GPU metrics, and images rendered directly in the terminal.

Weights & Biases
Major Features & Updates

W&B Automations

W&B Automations launch: event triggers from training runs

Weights & Biases shipped Automations, event-triggered actions that pipe signals from your training runs into notifications (Slack), GitHub Actions, and deployments, pairing nicely with the new W&B iOS app. In the same Buzz segment: GLM-5.1 and Gemma 4 both went live on W&B Inference.

March 2026

Weights & Biases
Benchmarks & Evals

Wolf Bench

Wolfram previews Wolf Bench, a multi-metric agent eval from W&B

Wolfram Ravenwolf gave an early preview of Wolf Bench, a Terminal Bench-based evaluation framework from Weights & Biases that reports four metrics (average, best run, ceiling, and consistent floor) instead of a single score. It treats harness differences (Terminal Bench vs Claude Code vs OpenClaw) as a first-class factor and publishes benchmark cost and transparency details.

February 2026

Weights & Biases
Major Features & Updates

W&B Inference: MiniMax 2.5 & Kimi K2.5

W&B Inference adds MiniMax 2.5 and Kimi K2.5

Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.

Weights & Biases
Major Features & Updates

Kimi K2.5 on W&B Inference

W&B adds Kimi K2.5 to its inference service

Weights & Biases launched Kimi K2.5 on its inference service, making Moonshot AI's model available to W&B users. In Wolfram's Terminal Bench deep dive for W&B, Kimi K2.5 achieved a 67.4% ceiling score across multiple runs, among the strongest open-model results he measured.

January 2026

December 2025

Weights & Biases
Products & Apps

LLM Evaluation Jobs

W&B launches LLM Evaluation Jobs for OpenAI-compatible APIs

Weights & Biases launched LLM Evaluation Jobs, letting teams run evaluations against any OpenAI-compatible API during training cycles instead of only at the end. The show framed it as a practical workflow upgrade for getting earlier model quality signals without blindly burning compute.

November 2025

Weights & Biases
Products & Apps

Serverless LoRA Inference

W&B launches Serverless LoRA Inference on CoreWeave

Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.

Weights & Biases
Dev ToolsOpen weights

W&B LEET

W&B ships LEET, an open-source terminal UI for monitoring ML runs

Weights & Biases released LEET (Lightweight Experiment Exploration Tool), an open-source terminal-native dashboard for tracking ML runs, demoed live by Dima Duev of the SDK team. It works fully offline for air-gapped HPC clusters and brings real-time metrics, system stats, and zoomable interactive charts to the terminal.

September 2025

Weights & Biases
Major Features & Updates

Weave in W&B Workspaces

W&B brings Weave traces into Models workspaces for RL runs

Weights & Biases shipped Weave inside W&B Models workspaces, so reinforcement learning runs can now be logged and inspected with Weave trace tooling alongside training metrics. The show frames it as giving RL training 'x-ray vision' into what the model is actually doing.

April 2025

Weights & Biases
Major Features & Updates

W&B Weave Playground

W&B Weave Playground adds GPT-4.1 family and o3/o4-mini support

The Weights & Biases Weave Playground shipped full support for the new GPT-4.1 family and the o3/o4-mini models, letting developers evaluate and compare the week's new models for their own applications.

Weights & Biases
Also Released

observable.tools & MCP RFC-269

W&B launches observable.tools initiative and MCP observability RFC

Weights & Biases launched the observable.tools initiative and published an RFC (RFC-269) proposing observability standards for the Model Context Protocol, inviting community comment. W&B also announced it is a launch partner for Google's A2A protocol.

Weights & Biases
Also ReleasedOpen weights

Observable Tools

W&B launches Observable.tools initiative to add observability to MCP

Alex and Weights & Biases launched the Observable Tools initiative to bring observability to the Model Context Protocol (MCP) ecosystem, since external tool calls currently lose visibility for debugging and security. A concrete proposal using OpenTelemetry was posted to the MCP specification GitHub discussions for community feedback.

March 2025

Weights & Biases
Dev ToolsOpen weights

Weave MCP Server

W&B ships official Weave MCP server - talk to your evals

Weights & Biases shipped an official MCP server for Weave, its LLM observability and evaluation tool, letting agents and MCP clients query and analyze your evals directly. Morgan McQuire of the W&B Applied AI team demoed it on the show, with wandb Models integration coming soon so agents can monitor loss curves for you.

February 2025

Weights & Biases
Papers & Research

Agents Whitepaper & Course

Weights & Biases releases an AI agents whitepaper and announces agents course

Weights & Biases released a whitepaper on evaluating AI agent applications and announced an upcoming agents course built in collaboration with OpenAI's Ilan Biggio, with signups at wandb.me/agents. The push targets agent evaluation and observability tooling for the community.

January 2025