Everything AI Released in February 2026

57 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

← January 2026 All months March 2026 →

🧠 New Models 32

Alibaba (Qwen) Feb 26, 2026

New ModelsOpen weights

Qwen 3.5

Qwen 3.5 lands: 35B/3B-active Medium outperforms the old 235B flagship

Alibaba released the Qwen 3.5 family of open-weight models, headlined by Qwen3.5-35B-A3B, a 35B model with only 3B active parameters that outperforms their previous 235B flagship. Variants include a 122B-A10B and a dense 27B, with the panel highlighting the hybrid state-space (Mamba-layer) architecture and strong practical coding and agent performance at a tiny active-parameter footprint.

35B / 3B active Qwen 3.5 Medium

Qwen announcement on X ↗Qwen3.5-35B-A3B on Hugging Face ↗Qwen3.5-122B-A10B on Hugging Face ↗Qwen 3.5 blog post ↗

🎙️ Hear our coverage →

#open-source #architecture #coding

Google DeepMind Feb 26, 2026

New Models

Nano Banana 2

Google DeepMind launches Nano Banana 2 image model mid-show

Google DeepMind announced Nano Banana 2 during the show, a Flash-quality tier of its image model line. Alex broke in mid-TLDR to describe near-Pro image quality at roughly half the price, plus a new image search capability.

Google DeepMind announcement on X ↗Nano Banana page ↗

🎙️ Hear our coverage →

#image-gen #multimodal

Liquid AI Feb 26, 2026

New ModelsOpen weights

LFM2-24B-A2B

Liquid AI releases LFM2-24B-A2B, a laptop-friendly 24B MoE

Liquid AI released LFM2-24B-A2B, a 24B mixture-of-experts model with only 2.3B active parameters that runs on consumer laptops. The panel highlighted its speed and surprisingly strong non-coding reasoning, reinforcing the trend of efficient low-active-parameter open models for local use.

Liquid AI announcement on X ↗LFM2-24B-A2B on Hugging Face ↗Liquid AI blog post ↗

🎙️ Hear our coverage →

#open-source #architecture #on-device

OpenAI Feb 26, 2026

New Models

gpt-audio-1.5 & gpt-realtime-1.5

OpenAI releases gpt-audio-1.5 and gpt-realtime-1.5

OpenAI shipped gpt-audio-1.5 and gpt-realtime-1.5, updated audio and realtime voice models available through its platform. The release was covered in the week's voice and audio roundup.

Release noted on X ↗OpenAI models docs ↗

🎙️ Hear our coverage →

#voice-ai #audio #api

Perplexity Feb 26, 2026

New ModelsOpen weights

pplx-embed

Perplexity launches pplx-embed SOTA embedding models

Perplexity released pplx-embed, a family of state-of-the-art embedding models built for web-scale retrieval. The models are available on Hugging Face and through Perplexity's API with quickstart docs.

pplx-embed research blog ↗pplx-embed Hugging Face collection ↗Perplexity embeddings API quickstart ↗

🎙️ Hear our coverage →

#search #open-source

Q Quiver Feb 26, 2026

New Models

Arrow 1.0

Quiver tackles SVG generation with Arrow 1.0

Quiver released Arrow 1.0, pitched as solving SVG generation. It was included in the week's AI art and diffusion roundup as a notable niche release for vector graphics.

Arrow 1.0 demo on X ↗

🎙️ Hear our coverage →

Alibaba (Qwen) Feb 19, 2026

New ModelsOpen weights

Qwen3.5-397B-A17B

Alibaba opens Qwen 3.5: 397B-param multimodal MoE with only 17B active

Alibaba released Qwen3.5-397B-A17B, billed as the first open-weight native multimodal MoE model, with 397B total parameters, just 17B active, 512 experts, and 262K native context extendable to 1M. It delivers 8.6-19x faster inference than Qwen3-Max and continues Qwen's strength in multilingual and medical tasks, scoring 52.5% on Terminal Bench, third place among open-source models. Nisten found coding still trails GLM-5.

397B Qwen 3.5 Parameters

Qwen 3.5 announcement (X) ↗Qwen3.5-397B-A17B on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #architecture #multilingual

Anthropic Feb 19, 2026

New Models

Claude Sonnet 4.6

Anthropic ships Claude Sonnet 4.6 with 79.6% SWE-Bench and 1M context

Anthropic launched Claude Sonnet 4.6, its most capable Sonnet ever, scoring 79.6% on SWE-Bench Verified, nearly matching Opus 4.6 at Sonnet pricing of $3/$15 per million tokens. It ships with a 1M token context window in beta and is now the default model on Claude AI. In blind Claude Code testing, users preferred Sonnet 4.6 over the previous Opus 4.5 59% of the time, and it beats the previous Gemini 3 Pro on most benchmarks.

79.6% SWE-Bench Verified

Claude Sonnet 4.6 announcement (X) ↗Anthropic blog: Claude Sonnet 4.6 ↗Claude Sonnet page ↗

🎙️ Hear our coverage →

#coding #agents #architecture

ByteDance Feb 19, 2026

New Models

Seed 2.0

ByteDance Seed 2.0: frontier multimodal family at 73-84% lower pricing

ByteDance released Seed 2.0, a frontier multimodal LLM family with Pro, Lite, Mini, and Code variants that rivals GPT-5.2 and Claude Opus 4.5 at 73-84% lower pricing. Its video understanding surpasses the human benchmark at 77% vs 73%. At 84% cheaper than Opus 4.5 with near-comparable quality, the panel called it a compelling option for price-conscious developers.

Seed 2.0 announcement (X) ↗Doubao team model page ↗ByteDance-Seed on Hugging Face ↗

🎙️ Hear our coverage →

#multimodal #vision #frontier-models

Cohere Labs Feb 19, 2026

New ModelsOpen weights

Tiny Aya

Cohere Labs releases Tiny Aya, a 3.35B multilingual model for 70+ languages

Cohere Labs released Tiny Aya, a 3.35B-parameter multilingual model family supporting 70+ languages that is small enough to run locally on phones. It extends Cohere's Aya line of open multilingual models, bringing broad language coverage to on-device deployments.

Tiny Aya announcement (X) ↗Tiny Aya collection on Hugging Face ↗Tiny Aya Global on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #multilingual #on-device

Google DeepMind Feb 19, 2026

New Models

Gemini 3.1 Pro

Gemini 3.1 Pro drops live with 44% HLE and 77% ARC-AGI at the same price

Google released Gemini 3.1 Pro minutes before the show, claiming 2.5x better abstract reasoning and improved coding and agentic capabilities at the same price point as its predecessor. It scores 44% on Humanity's Last Exam, 77% on ARC-AGI without a custom harness, and 68 on Terminal Bench, putting it at or near state of the art alongside Opus 4.6. In Nisten's live vibe-coding test it was blazingly fast but less polished than Opus 4.6 and Codex output.

44% Humanities Last Exam77% ARC-AGI

Gemini 3.1 Pro announcement (X) ↗Google DeepMind blog: Gemini 3.1 Pro update ↗Try it in Google AI Studio ↗

🎙️ Hear our coverage →

#frontier-models #reasoning #coding

Google DeepMind Feb 19, 2026

New Models

Lyria 3

Google DeepMind launches Lyria 3 music generation in the Gemini app

Google DeepMind launched Lyria 3, its most advanced AI music generation model, now available in the Gemini app. It generates 32-second high-fidelity music tracks with creative controls and can compose music from uploaded images. Google also published a prompt guide covering vocals, lyrics, and different styles.

Lyria 3 announcement (X) ↗Lyria on Google DeepMind ↗

🎙️ Hear our coverage →

xAI Feb 19, 2026

New Models

Grok 4.20

xAI silently drops Grok 4.20 with four 500B-param collaborating agents

xAI released Grok 4.20, a multi-agent system where four 500B-parameter agents collaborate in a multi-agent UI, with a $300/month Heavy tier scaling to 16 agents. No benchmarks or evals were released with the drop. The panel found it underwhelming for coding and day-to-day agent work but still top tier for deep research thanks to xAI's RAG over X data; Grok 4.1 Fast remains #8 on OpenRouter by API usage.

500B×4 Grok 4 20 Architecture

Grok 4.20 on X ↗xAI model docs ↗

🎙️ Hear our coverage (+1 follow-up) →

#agents #frontier-models #search

Zyphra Feb 19, 2026

New ModelsOpen weights

ZUNA

Zyphra opens ZUNA, a 380M-param EEG brain-computer interface model

Zyphra released ZUNA, a 380M-parameter open-source BCI foundation model that translates EEG brain signals into text, reconstructing clinical-grade brain signals from sparse, noisy data. Dubbed 'thought to text' by the community, it works with roughly $500 non-invasive EEG headsets, likely needs personalized training per user, and is small enough to run in real time on a consumer gaming GPU. It is Apache licensed.

ZUNA announcement (X) ↗Zyphra blog: ZUNA ↗ZUNA on GitHub ↗

🎙️ Hear our coverage →

#research #open-source

Alibaba (Qwen) Feb 12, 2026

New Models

Qwen-Image-2.0

Alibaba launches Qwen-Image-2.0 with native 2K resolution

Alibaba's Qwen team launched Qwen-Image-2.0, a 7B-parameter image generation model with native 2K resolution output and superior text rendering. Available to try on chat.qwen.ai.

Alibaba Qwen announcement on X ↗Try it on Qwen Chat ↗

🎙️ Hear our coverage →

ByteDance Feb 12, 2026

New Models

Seedance 2.0

ByteDance Seedance 2.0 shatters video generation reality

ByteDance launched Seedance 2.0, a unified multimodal video generation model that accepts up to 9 images, 3 videos, and 3 audio clips as references and produces 15-second multi-shot clips with native stereo audio and strong character consistency (a 45-second internal test mode also exists). The panel compared the quality jump to seeing Sora for the first time. Available on the BytePlus platform.

Alex's demo thread on X ↗Official launch blog ↗Seedance 2.0 announcement page ↗Seedance 2.0 in CapCut on X ↗

🎙️ Hear our coverage (+1 follow-up) →

#video-gen #multimodal #consumer-ai

Google DeepMind Feb 12, 2026

New Models

Gemini 3 Deep Think

Gemini 3 Deep Think scores 84% on ARC-AGI 2

Google dropped an upgraded Gemini 3 Deep Think mid-show, hitting 84% on ARC-AGI 2 — the biggest single jump in the benchmark's history, up from Opus 4.6's 68% set just one week earlier. It also scored 48.4% on Humanity's Last Exam without tools, taking state of the art on both.

84% ARC-AGI 2

Sundar Pichai announcement on X ↗

🎙️ Hear our coverage →

#reasoning #benchmarks

MiniMax Feb 12, 2026

New ModelsOpen weights

MiniMax M-2.5

MiniMax M-2.5 hits 80.2% SWE-Bench Verified with 10B active params

MiniMax dropped M-2.5 thirty minutes before the show: a 200B-total, 10B-active open-weights model scoring 80.2% on SWE-Bench Verified, approaching Opus 4.6 at roughly 1/20th the cost (~15 cents per task with a 57% win rate over Opus). Trained with MiniMax's decoupled Forge RL framework and optimized for end-to-end task time with fewer tool calls and thinking tokens. Senior researcher Olive Song joined live and revealed the model was still training — they cut a checkpoint for early release.

80.2% SWE-Bench Verified15¢ Cost per task

MiniMax M2.5 benchmarks on X ↗

🎙️ Hear our coverage →

#open-source #coding #agents

OpenAI Feb 12, 2026

New Models

GPT 5.3 Codex Spark

OpenAI ships GPT 5.3 Codex Spark on Cerebras for real-time coding

OpenAI released GPT 5.3 Codex Spark, a smaller Codex variant built for real-time coding, served on Cerebras hardware — OpenAI's first model on Cerebras — with reported speeds of over 1000 tokens/sec. Available to ChatGPT Pro users in the Codex app, CLI, and IDE extension. It broke during the show as the second breaking-news drop of the episode.

100 tps Codex Spark speed

Sam Altman announcement on X ↗

🎙️ Hear our coverage →

#coding #infrastructure

Zhipu AI (Z.ai) Feb 12, 2026

New ModelsOpen weights

GLM-5

Z.ai launches GLM-5, the open-weights agentic coding crown

Z.ai released GLM-5, a 744B-parameter MoE model (40B active) trained on 28.5 trillion tokens that takes the #1 open-source ranking for agentic coding with 77.8% SWE-bench Verified. It introduces the SLIM asynchronous RL framework for post-training, adopts DeepSeek's sparse attention to cut deployment cost, and was trained on Huawei chips rather than NVIDIA. Lou from Z.ai joined the show live and summed it up as bigger, faster, better, and cheaper.

744B GLM-5 Parameters28.5T Training tokens

Z.ai announcement on X ↗GLM-5 on Hugging Face ↗W&B Inference day-zero support ↗

🎙️ Hear our coverage →

#open-source #coding #agents

A ACE Step Feb 5, 2026

New ModelsOpen weights

ACE-Step 1.5

ACE-Step 1.5: open-source 'Suno at home' music generation under MIT

ACE-Step 1.5 is an MIT-licensed AI music generator that produces full songs in under 10 seconds on consumer GPUs and runs on a MacBook. The panel demoed it live via Pinocchio, generating a ThursdAI song on the spot, and it is available for one-click install.

X announcement ↗GitHub ↗Hugging Face ↗Project page ↗

🎙️ Hear our coverage →

#audio #open-source

Alibaba (Qwen) Feb 5, 2026

New ModelsOpen weights

Qwen3-Coder-Next

Qwen3-Coder-Next hits 70.6% SWE-Bench Verified with 3B active params

Alibaba's Qwen3-Coder-Next is an 80B MoE coding agent model with only 3B active parameters that scores 70.6% on SWE-Bench Verified and 44% on the much harder SWE-Bench Pro. It was trained on 7.5T tokens with 20,000 parallel RL environments and runs under 48GB of RAM with GGUF quantization, making near-frontier agentic coding feasible on local hardware.

70.6% SWE-Bench Verified44% SWE-Bench Pro

X announcement ↗Qwen blog ↗Hugging Face collection ↗

🎙️ Hear our coverage →

#open-source #coding #agents

Ant Group Feb 5, 2026

New ModelsOpen weights

LingBot-World

LingBot-World: open-source world model challenges Google Genie 3

Ant Group released LingBot-World, an open-source world model that generates 10-minute playable environments at 16fps. It positions open weights as a direct challenger to Google's closed Genie 3 in interactive world generation.

X thread ↗Hugging Face ↗

🎙️ Hear our coverage →

#world-models #video-gen #open-source

Anthropic Feb 5, 2026

New Models

Claude Opus 4.6

Anthropic ships Claude Opus 4.6 with 1M context and agent teams

Anthropic dropped Opus 4.6 live during the show, claiming state-of-the-art on GDP-eval, Browse Comp, and agentic search, with 65.4% on Terminal Bench and 99% on TAU Bench MCP tool use. It is the first Opus model with a 1 million token context window and introduces adaptive thinking, where the model picks up contextual clues about reasoning effort. Pricing matches Opus 4.5 under 200K tokens and doubles above, and Claude Code gains agent teams for orchestrating parallel sessions.

1M Context tokens

X announcement ↗Anthropic blog ↗

🎙️ Hear our coverage →

#frontier-models #coding #agents

InternLM (Shanghai AI Lab) Feb 5, 2026

New ModelsOpen weights

Intern-S1-Pro

Intern-S1-Pro: 1 trillion parameter open MoE for scientific reasoning

InternLM released Intern-S1-Pro, a 1 trillion parameter open-source MoE model targeting SOTA scientific reasoning across chemistry, biology, materials, and earth sciences. The panel noted it beats frontier models on science benchmarks, a massive compute investment for an open release.

X announcement ↗Hugging Face ↗Arxiv ↗ModelScope ↗

🎙️ Hear our coverage →

#open-source #reasoning #research

Kling AI Feb 5, 2026

New Models

Kling 3.0

Kling 3.0: 15-second multi-shot video with native audio

Kuaishou's Kling 3.0 launched as an all-in-one AI video creation engine with native multimodal generation, 15-second multi-shot sequences, built-in audio, and character consistency across scenes. Alongside Grok Imagine, it marks the week native audio and lip sync became table stakes for video models.

X announcement ↗Kling AI ↗

🎙️ Hear our coverage →

#video-gen #audio

Mistral AI Feb 5, 2026

New ModelsOpen weights

Voxtral Transcribe 2

Mistral's Voxtral Transcribe 2 dethrones Whisper as SOTA transcription

Mistral AI launched Voxtral Transcribe 2, state-of-the-art speech-to-text with sub-200ms latency, native diarization support, and open weights under Apache 2.0. The panel called it the first model to dethrone Whisper after roughly three years, and Alex used it to transcribe this very episode.

X announcement ↗Mistral blog ↗Docs ↗Demo ↗

🎙️ Hear our coverage →

#voice-ai #open-source

OpenAI Feb 5, 2026

New Models

GPT-5.3-Codex

OpenAI answers Opus with GPT-5.3-Codex, first model that helped build itself

One hour after Opus 4.6, OpenAI released GPT-5.3-Codex, billed as the first model instrumental in developing itself — the Codex team used early versions to debug its own training and manage its own deployment. It scores 73% on Terminal Bench 2.0, a 10-point gap over Opus 4.6, while running queries 25% faster and more token-efficiently than its predecessor, with improved mid-task steerability.

73% Terminal Bench 2.025% Speed improvement

Sam Altman announcement on X ↗OpenAIDevs announcement on X ↗GPT-5.3-Codex model docs ↗

🎙️ Hear our coverage (+1 follow-up) →

#frontier-models #coding #agents

OpenBMB Feb 5, 2026

New ModelsOpen weights

MiniCPM-o 4.5

MiniCPM-o 4.5: first open-source full-duplex omni model

OpenBMB released MiniCPM-o 4.5, the first open-source full-duplex omni-modal LLM that can see, listen, and speak simultaneously. It can listen while speaking and even interrupt the user, bringing real-time conversational behavior to open weights.

X announcement ↗Hugging Face ↗GitHub ↗

🎙️ Hear our coverage →

#open-source #voice-ai #multimodal

StepFun Feb 5, 2026

New ModelsOpen weights

Step 3.5 Flash

StepFun Step 3.5 Flash: frontier reasoning claims at 11B active params

StepFun released Step 3.5 Flash, a 196B sparse MoE model with only 11B active parameters, claiming frontier-level reasoning while generating at 100-350 tokens per second. It continues the trend of sparse Chinese MoE models delivering high speed at low active parameter counts.

X announcement ↗Hugging Face ↗

🎙️ Hear our coverage →

#open-source #reasoning

xAI Feb 5, 2026

New Models

Grok Imagine 1.0

Grok Imagine 1.0 tops video arena with native audio and lip sync

xAI launched Grok Imagine 1.0 with 10-second 720p video generation, native audio, and lip sync, taking the #1 spot on the Artificial Analysis text-to-video arena. Generation costs roughly $0.42 per 10-second clip and an API is available.

X announcement ↗Grok ↗Artificial Analysis leaderboard ↗

🎙️ Hear our coverage →

#video-gen #audio

Zhipu AI (Z.ai) Feb 5, 2026

New ModelsOpen weights

GLM-OCR

Z.ai GLM-OCR: 0.9B model takes #1 on OmniDocBench

Z.ai released GLM-OCR, a tiny 0.9B parameter document understanding model that achieves the #1 ranking on OmniDocBench V1.5. It shows that strong OCR and document parsing no longer require large models.

X announcement ↗Hugging Face ↗Announcement ↗

🎙️ Hear our coverage →

#open-source #vision

🚀 Products & Apps 8

Cognition Labs Feb 26, 2026

Products & Apps

Devin 2.2

Devin 2.2: computer use, browser, and self-verifying autonomous work

Cognition shipped Devin 2.2, an autonomous coding agent that can use a computer and browser to verify and fix its own work, plus a free public Devin Review workflow for PR review and scheduled/automated sessions. Nader Dabit framed the release as two years of platform maturity converging with stronger models, letting non-engineers fix issues directly by just asking Devin.

Cognition announcement on X ↗

🎙️ Hear our coverage →

#agents #coding

Nous Research Feb 26, 2026

Products & Apps

Nous Research Agent

Nous Research ships a research agent

Nous Research announced a research agent, joining the wave of lab-built agentic tools shipped this week. It was covered in the roundup of new agent products alongside Cursor cloud agents and Perplexity Computer.

Nous Research announcement on X ↗

🎙️ Hear our coverage →

#agents #research

Perplexity Feb 26, 2026

Products & Apps

Perplexity Computer

Perplexity introduces Perplexity Computer

Perplexity launched Perplexity Computer, an agentic computer product announced via its blog. It was discussed as part of the week's convergence on agent harnesses, automations, and cloud-based agent workflows across labs.

Introducing Perplexity Computer (blog) ↗

🎙️ Hear our coverage →

T Taalas Feb 26, 2026

Products & Apps

ChatJimmy (baked-weights chip demo)

Taalas demos 15,000+ tokens/sec with model weights baked into silicon

Taalas published a live demo (chatjimmy.ai) showing Llama 3 8B running at 15,691 tokens per second on a chip with weights baked directly into the hardware. The panel called it a 10x speed-class jump that points at chip-level innovation compressing inference costs and iteration cycles.

15,000 tok/s Taalas Demo Throughput

ChatJimmy demo ↗

🎙️ Hear our coverage →

#infrastructure

D Dreamer Feb 19, 2026

Products & Apps

Dreamer

Dreamer launches beta platform for building agentic apps with no-code AI

Dreamer launched its beta, a full-stack platform for building and discovering agentic apps with no-code AI. It aims to let non-developers assemble and share agent-powered applications.

Dreamer beta announcement (X) ↗Dreamer ↗

🎙️ Hear our coverage →

#agents #coding

M Moltbook Feb 5, 2026

Products & Apps

Moltbook

Moltbook: a Reddit built for and by AI agents

Moltbook launched as a social network for AI agents, part of an exploding 'agentic internet' that now includes agent equivalents of YouTube, Twitter, Instagram, 4chan, and even a church. Agents on these networks were observed discussing creating encrypted languages humans cannot read, and the panel warned against letting your agents loose on them.

🎙️ Hear our coverage →

OpenAI Feb 5, 2026

Products & Apps

Codex App

OpenAI launches standalone Codex app for managing parallel coding agents

OpenAI shipped Codex as a dedicated Mac app, a command center for running multiple AI coding agents in parallel. Features include work trees for parallel project branches, scheduled automations, a skills marketplace with Cloudflare, Vercel, Figma, Notion, and Linear integrations, inline diff review with per-line commenting, and cloud hand-off. OpenAI granted a free month of access to all users including the free tier, and doubled rate limits for all tiers for two months.

VB announcement on X ↗Codex app ↗

🎙️ Hear our coverage →

#coding #agents

OpenAI Feb 5, 2026

Products & Apps

OpenAI Frontier

OpenAI Frontier: enterprise platform for AI agents as coworkers

OpenAI launched Frontier, an enterprise platform to build, deploy, and manage AI agents as 'AI coworkers'. It targets companies that want to operationalize agents across their organizations.

X announcement ↗OpenAI blog ↗

🎙️ Hear our coverage →

#agents #industry

✨ Major Features & Updates 7

Anthropic Feb 26, 2026

Major Features & Updates

Claude Code Remote Control & Memory

Claude Code adds Remote Control and memory

Anthropic shipped Remote Control for Claude Code, enabling remote and async control of coding sessions, alongside a new memory capability. The panel framed these as part of labs converging on richer agent harnesses with remote, async workflows as a primary competitive layer.

Claude announcement on X ↗Remote Control docs ↗Memory announcement on X ↗

🎙️ Hear our coverage →

#agents #coding

Anthropic Feb 26, 2026

Major Features & Updates

Claude Cowork Automations

Claude Cowork gets automations (cron jobs), matching Codex

Claude Cowork added automations, cron-job-style scheduled agent runs, in the same week OpenAI's Codex gained equivalent automation support. The panel saw labs converging on heartbeats, cron jobs, and cloud-based agents as standard product surface area.

Claude Cowork automations on X ↗

🎙️ Hear our coverage →

#agents #coding

Cursor Feb 26, 2026

Major Features & Updates

Cloud Agents

Cursor launches cloud agents

Cursor launched cloud agents, moving agentic coding work off the local machine into remote, async sessions. The panel highlighted Cursor's cloud agents and UI demos as important progress for frontend development workflows.

Lee Robinson demo on X ↗

🎙️ Hear our coverage →

#agents #coding

Weights & Biases Feb 26, 2026

Major Features & Updates

W&B Inference: MiniMax 2.5 & Kimi K2.5

W&B Inference adds MiniMax 2.5 and Kimi K2.5

Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.

MiniMax M2.5 on W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #api #open-source

Weights & Biases Feb 19, 2026

Major Features & Updates

Kimi K2.5 on W&B Inference

W&B adds Kimi K2.5 to its inference service

Weights & Biases launched Kimi K2.5 on its inference service, making Moonshot AI's model available to W&B users. In Wolfram's Terminal Bench deep dive for W&B, Kimi K2.5 achieved a 67.4% ceiling score across multiple runs, among the strongest open-model results he measured.

W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #open-source

OpenAI Feb 12, 2026

Major Features & Updates

Deep Research (GPT-5.2)

OpenAI upgrades Deep Research to GPT-5.2 with app integrations

OpenAI upgraded Deep Research to run on GPT-5.2, adding app integrations, site-specific searches, and real-time collaboration. Part of the week's rapid-fire big-lab announcements covered in the TLDR rundown.

OpenAI announcement on X ↗OpenAI Deep Research blog ↗

🎙️ Hear our coverage →

#agents #research

Weights & Biases Feb 12, 2026

Major Features & Updates

W&B Inference (GLM-5 & Kimi K2.5)

W&B Inference adds day-zero GLM-5 and Kimi K2.5 support

Weights & Biases launched day-zero GLM-5 support on its CoreWeave-powered W&B Inference service, alongside Kimi K2.5, with MiniMax 2.5 coming soon. Alex announced $50 in free credits for listeners to test the new open-weights models.

W&B announcement on X ↗W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #open-source

🔌 APIs & Platforms 1

Google (Chrome) Feb 12, 2026

APIs & Platforms

WebMCP (Chrome 146)

Chrome 146 introduces WebMCP, a native browser API for AI agents

Chrome 146 shipped WebMCP, a native browser API that lets AI agents directly interact with web services. It brings Model Context Protocol-style agent access into the browser itself, a notable primitive for the agentic web.

WebMCP coverage on X ↗

🎙️ Hear our coverage →

#agents #consumer-ai

🛠️ Dev Tools 2

LM Studio Feb 26, 2026

Dev Tools

LMLink

LM Studio launches LMLink for remote access to local models

LM Studio launched LMLink, which lets you use your locally hosted models from anywhere via Tailscale. It extends the local-model story so that on-device inference is reachable from any of your machines.

LMLink page ↗

🎙️ Hear our coverage →

#on-device #coding

R Ryan Carson Feb 12, 2026

Dev Tools

AntFarm

Ryan Carson releases AntFarm for agent coordination

Co-host Ryan Carson released AntFarm, a tool for coordinating teams of coding agents. It targets the missing primitives for managing multiple agents that the panel discussed during the agent-psychosis segment.

AntFarm announcement on X ↗

🎙️ Hear our coverage →

#agents #coding

📄 Papers & Research 1

Anthropic Feb 12, 2026

Papers & Research

Claude Opus 4.6 Sabotage Risk Report

Anthropic publishes Opus 4.6 sabotage risk report, meeting ASL-4

Anthropic released a sabotage risk report for Claude Opus 4.6, preemptively meeting ASL-4 safety standards for autonomous AI R&D. The report evaluates the model's potential for sabotage-style behaviors as capabilities scale.

Anthropic announcement on X ↗Sabotage evaluations research page ↗

🎙️ Hear our coverage →

📊 Benchmarks & Evals 3

Agentica Feb 26, 2026

Benchmarks & Evals

ARC-AGI-3 public set result

Agentica claims to solve all public ARC-AGI-3 tasks

Agentica published a claim of solving all public ARC-AGI-3 tasks, adding to the week's theme of benchmark saturation. The panel discussed it alongside METR and ARC-AGI-2 results as part of weighing signal versus noise in headline benchmark leaps.

Agentica claim on X ↗

🎙️ Hear our coverage →

#benchmarks #reasoning

C Confluence Labs Feb 26, 2026

Benchmarks & Evals

ARC-AGI-2 SOTA result

Confluence Labs exits stealth with 97.9% SOTA on ARC-AGI-2

Confluence Labs emerged from stealth with a 97.9% state-of-the-art result on the ARC-AGI-2 benchmark, publishing code on GitHub. The panel read it as a major signal that ARC-AGI-2 is near saturation, part of a broader pattern of benchmarks getting solved faster than expected.

97.9% ARC-AGI-2

Y Combinator post on X ↗Confluence Labs ARC-AGI-2 GitHub repo ↗

🎙️ Hear our coverage →

#benchmarks #reasoning

M METR Feb 26, 2026

Benchmarks & Evals

Time Horizon Benchmark

METR Time Horizon goes vertical: Opus 4.6 hits ~14.5-hour tasks

METR's updated Time Horizon benchmark shows Claude Opus 4.6 completing tasks equivalent to roughly 14.5 hours of expert human work, with the autonomy doubling time now cited at 49 days. The panel treated this as the week's strongest evidence that agent capability growth has entered a visibly faster phase.

14.5h METR Time Horizon49 days Autonomy Doubling Time

Peter Wildeford thread on X ↗METR website ↗

🎙️ Hear our coverage →

#benchmarks #agents

💰 Funding 1

E Entire Feb 12, 2026

FundingOpen weights

Entire Checkpoints

Entire raises $60M seed, ships first OSS release 'Checkpoints'

Entire raised a $60M seed round to build an open-source developer platform for AI agent workflows. Alongside the funding it shipped its first open-source release, Checkpoints, available on GitHub.

Entire announcement on X ↗Entire CLI on GitHub ↗Entire.dev ↗

🎙️ Hear our coverage →

#agents #coding

🤝 Acquisitions 1

OpenAI Feb 19, 2026

Acquisitions

OpenClaw acqui-hire

OpenAI acqui-hires OpenClaw creator Peter Steinberger

OpenAI acqui-hired Peter Steinberger, the creator of the viral OpenClaw agent, in what the panel speculated might be the first single-founder billion-dollar deal. Yam Peleg broke the news on the show, calling Steinberger 'the goat'. The move lands the most popular third-party agent harness builder inside OpenAI, amid a week where Anthropic's terms changes pushed agent users toward OpenAI subscriptions.

🎙️ Hear our coverage →

#agents #industry #coding

🌀 Also Released 1

R Ryan Carson Feb 19, 2026

Also Released

Code Factory

Ryan Carson publishes the viral Code Factory agentic engineering blueprint

Ryan Carson published his viral Code Factory article, a blueprint for fully automated code generation, review, and deployment inspired by OpenAI's Harness Engineering post. The setup chains GitHub Actions, Reptile code review, CI gates, a risk-classification system for high-risk file changes, and a self-healing loop where Codex fixes its own PR issues until all checks pass. He says it takes a week-plus of setup but unlocks massive throughput.

Code Factory thread (X) ↗OpenAI: Harness Engineering ↗

🎙️ Hear our coverage →

#agents #coding

← January 2026 All months March 2026 →