Weights & Biases

25 releases covered on ThursdAI · wandb.ai ↗

June 2026

Weights & Biases Jun 29, 2026

Products & Apps

Aria

W&B Aria auto-research agent goes GA

Aria went generally available on Monday — an auto-research agent living in the W&B UI ('Just Ask Aria') that reads your traces and debugs your loss curves. In Zubin Aysola's AI Engineer talk, Aria read its own production traces and updated its own prompts.

Weights & Biases ↗

🎙️ Hear our coverage →

#agents #research #infrastructure

Weights & Biases / CoreWeave Jun 18, 2026

APIs & Platforms

Kimi K2.7 Code on CoreWeave Inference

Kimi K2.7 Code goes live on W&B/CoreWeave Inference

Kimi K2.7 Code became available on W&B/CoreWeave Inference, with the episode notes calling out Blackwell NVFP4 serving, speculative decoding, and 289 tokens per second near the top of Artificial Analysis speed and price-performance charts.

289 tok/s reported throughput

CoreWeave announcement ↗Try Kimi K2.7 Code on W&B/CoreWeave Inference ↗

🎙️ Hear our coverage →

#api #infrastructure #coding

Weights & Biases Jun 18, 2026

Dev ToolsOpen weights

HiveMind

Weights & Biases launches HiveMind for coding-agent observability

Weights & Biases launched HiveMind, a dashboard for tracking AI coding-agent sessions, spend, transcripts, ROI, and reusable organizational learning. Chris Van Pelt and Adrian Swanberg joined the show to explain why teams need observability for their growing fleet of coding agents.

W&B announcement on X ↗HiveMind ↗HiveMind on GitHub ↗

🎙️ Hear our coverage →

#coding #agents #infrastructure

May 2026

Weights & Biases May 28, 2026

Dev Tools

W&B MCP Server

Weights & Biases launches MCP server with 20 tools for agents

W&B officially launched its MCP server with 20 schema-first tools so coding agents can read experiments, monitor training, and run autonomous research loops. Agents can query metadata before pulling full 300-metric runs, keeping their context windows from blowing up.

W&B MCP Server ↗W&B MCP Server — blog ↗W&B announcement ↗

🎙️ Hear our coverage →

#agents #coding #infrastructure

April 2026

Weights & Biases Apr 23, 2026

Major Features & Updates

W&B LEET Workspace Mode

W&B LEET TUI ships workspace mode with multi-run compare and GPU metrics

Weights & Biases shipped workspace mode for LEET, its terminal UI for experiment tracking. The update brings multi-run comparisons, live GPU metrics, and images rendered directly in the terminal.

W&B LEET TUI workspace mode ↗

🎙️ Hear our coverage →

Weights & Biases Apr 16, 2026

Major Features & Updates

Gemma 4 on W&B Inference

Gemma 4 goes live on W&B Inference with LoRA inference support

Weights & Biases put Gemma 4 live on W&B Inference, running on CoreWeave infrastructure with LoRA inference support. Replying to the W&B announcement post on X with the code 'Gem Drop' gets $20 in free inference credits.

W&B Inference ↗W&B announcement post (X) ↗

🎙️ Hear our coverage →

#infrastructure #open-source

Weights & Biases Apr 9, 2026

Major Features & Updates

W&B Automations

W&B Automations launch: event triggers from training runs

Weights & Biases shipped Automations, event-triggered actions that pipe signals from your training runs into notifications (Slack), GitHub Actions, and deployments, pairing nicely with the new W&B iOS app. In the same Buzz segment: GLM-5.1 and Gemma 4 both went live on W&B Inference.

W&B Inference ↗wandb.com ↗

🎙️ Hear our coverage →

#infrastructure #coding

March 2026

Weights & Biases Mar 19, 2026

Products & Apps

W&B iOS App

Weights & Biases launches native iOS app for monitoring training runs

W&B shipped its most-requested feature ever: a native iOS app for monitoring AI training runs with live metrics and push notifications for crash alerts. Practitioners can now keep an eye on long-running training jobs from their phone instead of staying glued to a dashboard.

W&B on X ↗App Store ↗W&B site ↗

🎙️ Hear our coverage →

#coding #infrastructure

Weights & Biases Mar 13, 2026

Dev ToolsOpen weights

W&B Agent Skills

Weights & Biases launches Agent Skills

Weights & Biases officially launched Agent Skills, installable via `npx skills add wandb/skills`. The launch coincided with Nemotron 3 Super becoming available on W&B Inference at $0.20/1M input tokens, one of the best price-performance options for a 120B model.

W&B Agent Skills on X ↗W&B Skills on GitHub ↗

🎙️ Hear our coverage →

#agents #coding

Weights & Biases Mar 5, 2026

Benchmarks & Evals

Wolf Bench

Wolfram previews Wolf Bench, a multi-metric agent eval from W&B

Wolfram Ravenwolf gave an early preview of Wolf Bench, a Terminal Bench-based evaluation framework from Weights & Biases that reports four metrics (average, best run, ceiling, and consistent floor) instead of a single score. It treats harness differences (Terminal Bench vs Claude Code vs OpenClaw) as a first-class factor and publishes benchmark cost and transparency details.

🎙️ Hear our coverage →

#benchmarks #agents

February 2026

Weights & Biases Feb 26, 2026

Major Features & Updates

W&B Inference: MiniMax 2.5 & Kimi K2.5

W&B Inference adds MiniMax 2.5 and Kimi K2.5

Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.

MiniMax M2.5 on W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #api #open-source

Weights & Biases Feb 19, 2026

Major Features & Updates

Kimi K2.5 on W&B Inference

W&B adds Kimi K2.5 to its inference service

Weights & Biases launched Kimi K2.5 on its inference service, making Moonshot AI's model available to W&B users. In Wolfram's Terminal Bench deep dive for W&B, Kimi K2.5 achieved a 67.4% ceiling score across multiple runs, among the strongest open-model results he measured.

W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #open-source

Weights & Biases Feb 12, 2026

Major Features & Updates

W&B Inference (GLM-5 & Kimi K2.5)

W&B Inference adds day-zero GLM-5 and Kimi K2.5 support

Weights & Biases launched day-zero GLM-5 support on its CoreWeave-powered W&B Inference service, alongside Kimi K2.5, with MiniMax 2.5 coming soon. Alex announced $50 in free credits for listeners to test the new open-weights models.

W&B announcement on X ↗W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #open-source

January 2026

Weights & Biases Jan 8, 2026

Dev ToolsOpen weights

Catnip

Catnip by W&B: open source iOS app to run Claude Code anywhere

Chris Van Pelt of Weights & Biases released Catnip, an open source iOS app that lets you run Claude Code from anywhere via GitHub Codespaces. It is available on the App Store with source on GitHub.

Catnip by W&B on App Store ↗Catnip on GitHub ↗

🎙️ Hear our coverage →

#coding #agents #consumer-ai

December 2025

Weights & Biases Dec 4, 2025

Products & Apps

LLM Evaluation Jobs

W&B launches LLM Evaluation Jobs for OpenAI-compatible APIs

Weights & Biases launched LLM Evaluation Jobs, letting teams run evaluations against any OpenAI-compatible API during training cycles instead of only at the end. The show framed it as a practical workflow upgrade for getting earlier model quality signals without blindly burning compute.

W&B LLM Evaluation Jobs ↗W&B announcement on X ↗

🎙️ Hear our coverage →

#benchmarks #coding #infrastructure

November 2025

Weights & Biases Nov 27, 2025

Products & Apps

Serverless LoRA Inference

W&B launches Serverless LoRA Inference on CoreWeave

Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.

W&B Serverless LoRA Report ↗W&B LoRA Notebook ↗W&B Announcement on X ↗

🎙️ Hear our coverage →

#infrastructure #training #coding

Weights & Biases Nov 13, 2025

Dev ToolsOpen weights

W&B LEET

W&B ships LEET, an open-source terminal UI for monitoring ML runs

Weights & Biases released LEET (Lightweight Experiment Exploration Tool), an open-source terminal-native dashboard for tracking ML runs, demoed live by Dima Duev of the SDK team. It works fully offline for air-gapped HPC clusters and brings real-time metrics, system stats, and zoomable interactive charts to the terminal.

W&B announcement on X ↗W&B LEET blog post ↗W&B LEET (wandb beta leet) ↗

🎙️ Hear our coverage →

#coding #infrastructure

September 2025

Weights & Biases Sep 18, 2025

Major Features & Updates

Weave in W&B Workspaces

W&B brings Weave traces into Models workspaces for RL runs

Weights & Biases shipped Weave inside W&B Models workspaces, so reinforcement learning runs can now be logged and inspected with Weave trace tooling alongside training metrics. The show frames it as giving RL training 'x-ray vision' into what the model is actually doing.

X ↗W&B Docs ↗

🎙️ Hear our coverage →

#infrastructure #coding #training

April 2025

Weights & Biases Apr 17, 2025

Major Features & Updates

W&B Weave Playground

W&B Weave Playground adds GPT-4.1 family and o3/o4-mini support

The Weights & Biases Weave Playground shipped full support for the new GPT-4.1 family and the o3/o4-mini models, letting developers evaluate and compare the week's new models for their own applications.

X ↗W&B Weave ↗

🎙️ Hear our coverage →

#benchmarks #coding

Weights & Biases Apr 10, 2025

Also Released

observable.tools & MCP RFC-269

W&B launches observable.tools initiative and MCP observability RFC

Weights & Biases launched the observable.tools initiative and published an RFC (RFC-269) proposing observability standards for the Model Context Protocol, inviting community comment. W&B also announced it is a launch partner for Google's A2A protocol.

observable.tools ↗MCP RFC ↗W&B + Google A2A partnership blog ↗

🎙️ Hear our coverage →

#agents #coding

Weights & Biases Apr 3, 2025

Also ReleasedOpen weights

Observable Tools

W&B launches Observable.tools initiative to add observability to MCP

Alex and Weights & Biases launched the Observable Tools initiative to bring observability to the Model Context Protocol (MCP) ecosystem, since external tool calls currently lose visibility for debugging and security. A concrete proposal using OpenTelemetry was posted to the MCP specification GitHub discussions for community feedback.

Observable.tools ↗OpenTelemetry proposal on MCP spec GitHub ↗Viral MCP clients tweet ↗

🎙️ Hear our coverage →

#agents #coding

March 2025

Weights & Biases Mar 27, 2025

Dev ToolsOpen weights

Weave MCP Server

W&B ships official Weave MCP server - talk to your evals

Weights & Biases shipped an official MCP server for Weave, its LLM observability and evaluation tool, letting agents and MCP clients query and analyze your evals directly. Morgan McQuire of the W&B Applied AI team demoed it on the show, with wandb Models integration coming soon so agents can monitor loss curves for you.

X announcement ↗GitHub repo ↗Example W&B report ↗

🎙️ Hear our coverage →

#agents #benchmarks #coding

Weights & Biases Mar 6, 2025

Acquisitions

CoreWeave acquisition of Weights & Biases

Weights & Biases is acquired by CoreWeave

CoreWeave announced it is acquiring Weights & Biases, the AI developer platform and ThursdAI's home company. The deal pairs W&B's experiment tracking, Weave, and models tooling with CoreWeave's AI cloud infrastructure.

W&B Announcement ↗

🎙️ Hear our coverage →

#industry #infrastructure

February 2025

Weights & Biases Feb 20, 2025

Papers & Research

Agents Whitepaper & Course

Weights & Biases releases an AI agents whitepaper and announces agents course

Weights & Biases released a whitepaper on evaluating AI agent applications and announced an upcoming agents course built in collaboration with OpenAI's Ilan Biggio, with signups at wandb.me/agents. The push targets agent evaluation and observability tooling for the community.

Whitepaper ↗Agents course signup ↗

🎙️ Hear our coverage →

#agents #benchmarks #coding

January 2025

Weights & Biases Jan 23, 2025

Also Released

W&B SWE-bench Verified SOTA agent

W&B programming agent breaks SOTA on SWE-bench Verified

Weights & Biases announced a state-of-the-art AI programming agent built with OpenAI's o1 that broke the SOTA score on SWE-bench Verified. The work was developed and tracked with W&B Weave, the team's LLM observability toolkit.

W&B SOTA programming agent report ↗W&B Weave ↗

🎙️ Hear our coverage →

#coding #agents #benchmarks