Weights & Biases

24 releases covered on ThursdAI · wandb.ai ↗

June 2026

APIs & Platforms

Kimi K2.7 Code on CoreWeave Inference

Kimi K2.7 Code goes live on W&B/CoreWeave Inference

Kimi K2.7 Code became available on W&B/CoreWeave Inference, with the episode notes calling out Blackwell NVFP4 serving, speculative decoding, and 289 tokens per second near the top of Artificial Analysis speed and price-performance charts.

289 tok/s reported throughput
Weights & Biases
Dev ToolsOpen weights

HiveMind

Weights & Biases launches HiveMind for coding-agent observability

Weights & Biases launched HiveMind, a dashboard for tracking AI coding-agent sessions, spend, transcripts, ROI, and reusable organizational learning. Chris Van Pelt and Adrian Swanberg joined the show to explain why teams need observability for their growing fleet of coding agents.

May 2026

Weights & Biases
Dev Tools

W&B MCP Server

Weights & Biases launches MCP server with 20 tools for agents

W&B officially launched its MCP server with 20 schema-first tools so coding agents can read experiments, monitor training, and run autonomous research loops. Agents can query metadata before pulling full 300-metric runs, keeping their context windows from blowing up.

April 2026

Weights & Biases
Major Features & Updates

W&B LEET Workspace Mode

W&B LEET TUI ships workspace mode with multi-run compare and GPU metrics

Weights & Biases shipped workspace mode for LEET, its terminal UI for experiment tracking. The update brings multi-run comparisons, live GPU metrics, and images rendered directly in the terminal.

Weights & Biases
Major Features & Updates

W&B Automations

W&B Automations launch: event triggers from training runs

Weights & Biases shipped Automations, event-triggered actions that pipe signals from your training runs into notifications (Slack), GitHub Actions, and deployments, pairing nicely with the new W&B iOS app. In the same Buzz segment: GLM-5.1 and Gemma 4 both went live on W&B Inference.

March 2026

Weights & Biases
Benchmarks & Evals

Wolf Bench

Wolfram previews Wolf Bench, a multi-metric agent eval from W&B

Wolfram Ravenwolf gave an early preview of Wolf Bench, a Terminal Bench-based evaluation framework from Weights & Biases that reports four metrics (average, best run, ceiling, and consistent floor) instead of a single score. It treats harness differences (Terminal Bench vs Claude Code vs OpenClaw) as a first-class factor and publishes benchmark cost and transparency details.

February 2026

Weights & Biases
Major Features & Updates

W&B Inference: MiniMax 2.5 & Kimi K2.5

W&B Inference adds MiniMax 2.5 and Kimi K2.5

Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.

Weights & Biases
Major Features & Updates

Kimi K2.5 on W&B Inference

W&B adds Kimi K2.5 to its inference service

Weights & Biases launched Kimi K2.5 on its inference service, making Moonshot AI's model available to W&B users. In Wolfram's Terminal Bench deep dive for W&B, Kimi K2.5 achieved a 67.4% ceiling score across multiple runs, among the strongest open-model results he measured.

January 2026

December 2025

Weights & Biases
Products & Apps

LLM Evaluation Jobs

W&B launches LLM Evaluation Jobs for OpenAI-compatible APIs

Weights & Biases launched LLM Evaluation Jobs, letting teams run evaluations against any OpenAI-compatible API during training cycles instead of only at the end. The show framed it as a practical workflow upgrade for getting earlier model quality signals without blindly burning compute.

November 2025

Weights & Biases
Products & Apps

Serverless LoRA Inference

W&B launches Serverless LoRA Inference on CoreWeave

Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.

Weights & Biases
Dev ToolsOpen weights

W&B LEET

W&B ships LEET, an open-source terminal UI for monitoring ML runs

Weights & Biases released LEET (Lightweight Experiment Exploration Tool), an open-source terminal-native dashboard for tracking ML runs, demoed live by Dima Duev of the SDK team. It works fully offline for air-gapped HPC clusters and brings real-time metrics, system stats, and zoomable interactive charts to the terminal.

September 2025

Weights & Biases
Major Features & Updates

Weave in W&B Workspaces

W&B brings Weave traces into Models workspaces for RL runs

Weights & Biases shipped Weave inside W&B Models workspaces, so reinforcement learning runs can now be logged and inspected with Weave trace tooling alongside training metrics. The show frames it as giving RL training 'x-ray vision' into what the model is actually doing.

April 2025

Weights & Biases
Major Features & Updates

W&B Weave Playground

W&B Weave Playground adds GPT-4.1 family and o3/o4-mini support

The Weights & Biases Weave Playground shipped full support for the new GPT-4.1 family and the o3/o4-mini models, letting developers evaluate and compare the week's new models for their own applications.

Weights & Biases
Also Released

observable.tools & MCP RFC-269

W&B launches observable.tools initiative and MCP observability RFC

Weights & Biases launched the observable.tools initiative and published an RFC (RFC-269) proposing observability standards for the Model Context Protocol, inviting community comment. W&B also announced it is a launch partner for Google's A2A protocol.

Weights & Biases
Also ReleasedOpen weights

Observable Tools

W&B launches Observable.tools initiative to add observability to MCP

Alex and Weights & Biases launched the Observable Tools initiative to bring observability to the Model Context Protocol (MCP) ecosystem, since external tool calls currently lose visibility for debugging and security. A concrete proposal using OpenTelemetry was posted to the MCP specification GitHub discussions for community feedback.

March 2025

Weights & Biases
Dev ToolsOpen weights

Weave MCP Server

W&B ships official Weave MCP server - talk to your evals

Weights & Biases shipped an official MCP server for Weave, its LLM observability and evaluation tool, letting agents and MCP clients query and analyze your evals directly. Morgan McQuire of the W&B Applied AI team demoed it on the show, with wandb Models integration coming soon so agents can monitor loss curves for you.

February 2025

Weights & Biases
Papers & Research

Agents Whitepaper & Course

Weights & Biases releases an AI agents whitepaper and announces agents course

Weights & Biases released a whitepaper on evaluating AI agent applications and announced an upcoming agents course built in collaboration with OpenAI's Ilan Biggio, with signups at wandb.me/agents. The push targets agent evaluation and observability tooling for the community.

January 2025