Everything AI Released in March 2026

59 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

🧠 New Models 28

Cohere
New ModelsOpen weights

Cohere Transcribe

Cohere Transcribe: open-source 2B ASR tops Open ASR Leaderboard at 5.42% WER

Cohere entered the ASR game with Transcribe, a 2-billion-parameter Apache 2.0 speech recognition model that immediately took the number-one spot on Hugging Face's Open ASR Leaderboard with a 5.42% word error rate versus Whisper Large v3's 7.44%. It wins 61% of human evaluations on average and 64% head-to-head against Whisper, making it a credible local-inference Whisper replacement for regulated industries.

2B Cohere Transcribe ASR size5.42% Word error rate on Open ASR Leaderboard
Google DeepMind
New Models

Gemini 3.1 Flash Live

Google drops Gemini 3.1 Flash Live: Gemini can see, hear, and talk to you

Google released Gemini 3.1 Flash Live, a realtime multimodal model that handles voice and vision interaction in a single model path instead of stitched pipelines. The panel framed it as a major upgrade for end-to-end voice and vision agents, with AI Studio and API availability as the immediate way to experiment.

Google DeepMind
New Models

Lyria 3 Pro

Google Lyria 3 Pro generates full 3-minute music tracks with structural control

Google DeepMind released Lyria 3 Pro, its most advanced music model, generating full 3-minute tracks with structural control over intros, verses, choruses, and bridges, and even composing music from images. The crew generated a drum-and-bass ThursdAI opener live with spot-on instruction following; output is SynthID watermarked and royalty-free, available to Gemini subscribers and via Producer AI.

Luma AI
New Models

Uni-1

Luma Labs Uni-1 thinks and generates pixels simultaneously, #1 preference Elo

Luma Labs released Uni-1, an LLM-based image model that thinks and generates pixels simultaneously and claims the number-one human preference Elo. Unlike traditional diffusion workflows you converse with it and iterate together toward results, and it can also generate infographics; a surprising pivot from Luma's video focus.

MiniMax
New ModelsOpen weights

MiniMax 2.7

MiniMax 2.7 open-source weights discussed as small-model momentum continues

The panel covered MiniMax 2.7 and its open-weights release in the context of small, efficient models becoming genuinely practical for local and specialized agent workflows. The segment focused on capability momentum and how open-weights expectations keep shaping adoption sentiment.

Mistral AI
New ModelsOpen weights

Voxtral TTS

Mistral drops Voxtral TTS, a 3B open-weight text-to-speech model

Mistral released Voxtral TTS, its first text-to-speech model, as breaking news during the live show: 3 billion parameters, open weights, with emotion controls for neutral, happy, and frustrated voices. Mistral claims it beats ElevenLabs Flash v2.5 in human preference tests with a 58% win rate on flagship voices and 68% on zero-shot voice cloning, though Alex's live test found it decent rather than stunning.

3B Mistral Voxtral TTS size
Reka AI
New ModelsOpen weights

Reka Edge

Reka AI ships Edge, a 7B multimodal VLM for sub-second on-device inference

Reka AI launched Reka Edge, a 7B-parameter multimodal vision-language model built for sub-second latency on edge devices. Weights are on Hugging Face and the model is available through OpenRouter, with the panel highlighting it as a notable efficient multimodal release for real-world deployment.

Cursor
New Models

Composer 2

Cursor Composer 2 beats Opus 4.6 on TerminalBench at a tenth of the price

Cursor launched Composer 2, its first proprietary model that genuinely competes with frontier labs. It scores 61% on TerminalBench (beating Opus 4.6) at $0.50/M input tokens, cheaper than GPT-5.4 Mini and 10x cheaper than Opus, running at 300+ tokens/sec. A fast variant costs 3x more for the same intelligence, kicking off a new 'fast mode' pricing trend where you pay a premium for speed rather than capability.

H Company
New ModelsOpen weights

Holotron-12B

H Company's Holotron-12B: hybrid SSM computer-use model at 8.9k tok/s

H Company released Holotron-12B, an open-source hybrid SSM model built for computer-use agents. It claims 8,900 tokens/sec generation speed and jumps the WebVoyager benchmark from 35.1% to 80.5%, continuing the trend of hybrid SSM architectures for long-context agent workloads.

8,900 tok/s H Company Holotron 12B
MiniMax
New Models

MiniMax M2.7

MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro

MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.

56% MiniMax 2.7 SWE-bench Pro
Mistral AI
New ModelsOpen weights

Mistral Small 4

Mistral Small 4: 119B MoE with 6B active unifies vision, coding, reasoning

Mistral returned to open source with Small 4, a 119B-parameter MoE with 128 experts and only 6B active per token, released under Apache 2.0. It unifies the previous Pixtral (vision), Devstral (coding), and Magistral (reasoning) lines into one model and can fit on a single H100 when compressed. Early WolfBench results are sobering at ~17% on OpenClaw agent tasks, roughly on par with similarly sized Nemotron.

119B Mistral Small 4 total params
OpenAI
New Models

GPT-5.4 Mini & Nano

OpenAI ships GPT-5.4 Mini and Nano for coding, computer use, and subagents

OpenAI released GPT-5.4 Mini ($0.75/M input) and Nano, smaller variants optimized for coding and computer use at a fraction of flagship cost. Mini hits 72% on OS World verified, matching the human baseline and nearly reaching full 5.4's 75%, while beating Sonnet 4.5 on most benchmarks. They are designed as cheap parallel subagent workers under a GPT-5.4 orchestrator in Codex, and Mini is 2x faster than the previous GPT-5 Mini.

Xiaomi
New Models

MiMo

Xiaomi MiMo revealed as the 1T-param stealth model topping OpenRouter

Xiaomi revealed MiMo, a 1-trillion-parameter family with omni-modal and language-only variants, unmasked as the stealth model that had been sitting at #1 on OpenRouter. The reveal surprised the panel, marking Xiaomi's entry into the frontier-model conversation.

Fish Audio
New ModelsOpen weights

Fish Audio S2

Fish Audio S2 open TTS hits sub-150ms latency

Fish Audio S2 is a fully open-source TTS model with inline emotion control via free-text bracket tags like gasp, laughter, and long pause. Alex demoed it live with an OpenClaw skill that let his 5-year-old talk to a voice clone of 'Rocky' from Project Hail Mary; Wolfram called it 'ElevenLabs V3 for free.'

<150ms Fish Audio S2 TTS latency
Mixbread
New Models

embed-large-v3

Mixbread embed-large-v3 beats Gemini Embedding 2

mixbread.ai dropped embed-large-v3, an embedding model that beats Gemini Embedding 2 on nearly every benchmark, including a jaw-dropping 98% vs 6.9% on structured-data tasks. Benjamin Clavie announced it live during the show.

98% Mixbread embed-large-v3 structured data benchmark score (vs 6.9% for Gemini)
NVIDIA
New ModelsOpen weights

Nemotron 3 Super 120B

NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet

NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.

120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)
Alibaba (Qwen)
New ModelsOpen weights

Qwen3.5 Small Series

Alibaba releases Qwen3.5 small models (2B, 4B, 9B) for local use

Alibaba released the Qwen3.5 small model series with 2B, 4B, and 9B variants, which the panel found highly usable on consumer hardware. The release landed alongside leadership turbulence as Junyang Lin and Binyuan Hui departed Qwen, though the panel expects Alibaba's open-source momentum to continue.

Google DeepMind
New Models

Gemini 3.1 Flash-Lite

Google launches Gemini 3.1 Flash-Lite with 1M context at 360 tok/s

Google launched Gemini 3.1 Flash-Lite, a fast and cheap model with 1M token context aimed at the instant/fast tier, running around 360 tokens per second. The panel flagged a material pricing jump versus the prior Flash-Lite generation but saw it as well suited for judge, guardrail, and orchestration workloads in agent systems.

360 tokens/sec Gemini 3.1 Flash-Lite speed
OpenAI
New Models

GPT-5.3 Instant

OpenAI rolls out GPT-5.3 Instant as the free-tier fast model

OpenAI rolled out GPT-5.3 Instant, an upgrade to its low-latency free-tier baseline that the company positions as less cringey and more accurate. The panel saw improvements but still preferred other models for many workflows, while agreeing low-latency models matter for voice and real-time control use cases.

OpenAI
New Models

GPT-5.4

OpenAI drops GPT-5.4 Thinking and GPT-5.4 Pro live during the show

OpenAI released GPT-5.4 Thinking and GPT-5.4 Pro mid-show, a frontier general model that folds Codex-level coding into a unified reasoning model. It ships with a 1M token context window, a /fast mode, and mid-reasoning steering, posting 83.3% on ARC-AGI 2 (Pro) and roughly 75% on OS World computer use. The panel tested it live in Codex and called it a major general-model jump, while noting input pricing rose about 50% versus 5.2.

83.3% ARC-AGI 2 (GPT-5.4 Pro)75% OS World / computer-use score1M Context window
StepFun
New ModelsOpen weights

Step 3.5 Flash Base

StepFun open-sources Step 3.5 Flash Base with its training stack

StepFun released Step 3.5 Flash Base and Midtrain checkpoints, an unusually open release that includes training artifacts and the SteptronOSS training stack alongside the weights. The panel praised the Apache-2 orientation and called the continuation-pretraining flexibility a major practical unlock for builders.

🚀 Products & Apps 5

Modular
Products & Apps

Modular 26.2

Modular 26.2 runs FLUX.2 in under a second, 99% cheaper than Nano Banana

Modular shipped its 26.2 release with state-of-the-art image generation, running FLUX.2 in under one second (sub-300ms claims) at 99% lower cost than Nano Banana, plus upgraded AI coding with Mojo. Alex noted the surprise of an inference platform releasing model-level optimization and hoped the approach spreads to all image generation.

Phota Labs
Products & Apps

Phota Studio + API

Phota Labs launches Phota Studio + API with identity-preserving personalization

Phota Labs launched Phota Studio and an API around a photography-focused image model with identity-preserving personalization: upload a batch of your photos, it trains a personal model, and the generated images actually resemble you. Alex flagged the personalization as a real capability jump over the crowd of photo startups, for professional shots, photo fixes, and adding people to photos.

Manus (Meta)
Products & Apps

Manus My Computer

Manus launches 'My Computer' desktop app for macOS and Windows

Manus, now Meta-owned, launched 'My Computer', a desktop app that brings its AI agent from the cloud onto your local machine for macOS and Windows. The agent can now operate directly on local files and applications rather than running only in a hosted sandbox.

NVIDIA
Products & Apps

NemoClaw

NVIDIA announces NemoClaw, enterprise-hardened OpenClaw, at GTC

At GTC, Jensen Huang spent 15 minutes on OpenClaw, calling it the most important open source release since Linux and declaring 'every company needs an OpenClaw strategy.' NVIDIA released NemoClaw, a hardened enterprise reference implementation of OpenClaw with a privacy router and policy engine aimed at solving the agent security problem.

✨ Major Features & Updates 7

Anthropic
Major Features & Updates

Claude computer use (Cowork + Claude Code)

Claude can now control your Mac: computer use lands in Cowork and Claude Code

Anthropic shipped computer use as a research preview in Claude Cowork and Claude Code, letting Claude directly control local Mac workflows. The panel compared it to existing OpenClaw-style agent patterns and debated where direct UI control is genuinely useful versus overkill.

Anthropic
Major Features & Updates

Claude Opus 4.6 (1M context)

Anthropic makes Opus 4.6 1M context the default in Claude Code, same price

Anthropic made 1M token context the default for Opus 4.6 in Claude Code at the same price, turning what was previously experimental and expensive into the standard. MRCR benchmark performance holds at 93% at 256K and 76% at 1M. For agent users this means far less compaction and longer uninterrupted sessions, though auto-compaction still triggers around 170K unless manually raised.

1M Opus 4.6 context default
OpenAI
Major Features & Updates

Codex Subagents

OpenAI ships subagents for Codex with custom TOML configs

OpenAI added subagents to Codex, enabling parallel specialized agents configured via custom TOML files. Paired with the cheap GPT-5.4 Mini and Nano models, this enables the orchestrator-plus-workers pattern where a flagship model spawns inexpensive parallel subagents for tasks like visual testing.

🔌 APIs & Platforms 1

🛠️ Dev Tools 7

Unsloth AI
Dev ToolsOpen weights

Unsloth Studio

Unsloth Studio: web UI for local fine-tuning with 2x speed, 70% less VRAM

Unsloth launched Studio, an open-source web UI for local LLM training and inference claiming 2x speed and 70% less VRAM, supporting 500+ models across text, vision, audio, and embeddings. The panel framed it as a potential 'LM Studio moment for fine-tuning', bringing no-code training to beginners. Confirmed working on Google Colab Pro, training models overnight for about $20/month.

Andrej Karpathy
Dev ToolsOpen weights

AutoResearcher

Karpathy open-sources AutoResearcher for autonomous ML experiments

Andrej Karpathy open-sourced AutoResearch, a framework that runs AI-driven ML experiments autonomously. Over two days it ran 700 experiments on nanochat GPT-2, stacked 20 improvements, and achieved an 11% training speedup. Tobi Lütke adapted it overnight for Shopify's Liquid templating engine for a 51% render-time improvement, and the repo hit 26K GitHub stars quickly.

700 AutoResearcher experiments run in 2 days (Karpathy)11% GPT-2 training speedup from stacked AutoResearcher improvements51% Shopify Liquid render time improvement using AutoResearcher
Paperclip
Dev ToolsOpen weights

Paperclip.ing

Paperclip.ing: open-source agent orchestration for zero-human companies

Anonymous builder DOTTA presented Paperclip.ing, an open-source agent orchestration framework for 'zero human companies' where an AI CEO recursively hires more agents. It hit 20K GitHub stars in its first week, with a heartbeat system driving agent autonomy and a Memento-style memory architecture keeping agents coherent across tasks.

20K Paperclip GitHub stars in first week
OpenAI
Dev ToolsOpen weights

Symphony

OpenAI releases Symphony on GitHub

Ryan Carson experimented with OpenAI's Symphony framework, letting agents work through PRs overnight. One agent not only created a PR but found a bug and filed its own detailed Jira ticket with no human intervention, a small but telling sign of where agentic development is heading.

📄 Papers & Research 3

Google Research
Papers & Research

TurboQuant

Google TurboQuant claims 6x KV-cache compression and 8x faster inference

Google Research published TurboQuant, a KV-cache quantization technique claiming 6x compression and 8x inference speedup with near-zero accuracy loss. The panel framed it as a potential unlock for LLM inference economics, while calling stock-market panic over the result premature without broader production validation.

TurboQuant KV-cache compression TurboQuant speedup claim
Papers & ResearchOpen weights

Mamba-3

Mamba-3 lands with three SSM innovations for inference-first linear models

Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.

📊 Benchmarks & Evals 4

ARC Prize Foundation
Benchmarks & Evals

ARC-AGI-3

ARC-AGI-3 launches: humans score 100%, frontier models under 1%

ARC Prize launched ARC-AGI-3, an interactive agentic reasoning benchmark of turn-based puzzle games designed to test human-like generalization in novel abstract environments. Humans hit a 100% pass rate while top frontier models score under 1%, which the panel welcomed as a healthy reality check against AGI-is-here rhetoric and easy score inflation.

<1% ARC-AGI-3 frontier model scores100% Human completion on ARC-AGI-3
MarginLab
Benchmarks & Evals

Claude Code tracker

MarginLab tracker shows degradation in Opus 4.6 on Claude Code

MarginLab's public Claude Code tracker surfaced measurable degradation in Opus 4.6 performance, discussed in the evals and benchmarks roundup. The tracker continuously evaluates Claude Code behavior over time, making silent model regressions visible.

Weights & Biases
Benchmarks & Evals

Wolf Bench

Wolfram previews Wolf Bench, a multi-metric agent eval from W&B

Wolfram Ravenwolf gave an early preview of Wolf Bench, a Terminal Bench-based evaluation framework from Weights & Biases that reports four metrics (average, best run, ceiling, and consistent floor) instead of a single score. It treats harness differences (Terminal Bench vs Claude Code vs OpenClaw) as a first-class factor and publishes benchmark cost and transparency details.

🤝 Acquisitions 1

OpenAI
Acquisitions

Astral (uv, Ruff, ty)

OpenAI acquires Astral, makers of uv and Ruff, to join the Codex team

OpenAI acquired Astral, the company behind the uv Python package manager, Ruff, and ty, with the team joining Codex specifically — OpenAI's third acquisition of the month. The panel drew the parallel to Anthropic buying Bun for TypeScript infrastructure: OpenAI now owns core Python tooling for the code its agents write. The tools remain open source and forkable.

🌀 Also Released 3

Hugging Face
Also Released

State of Open Source Spring 2026 Report

Hugging Face report: China passes US in LLM count, Qwen tops 1B downloads

Hugging Face published its Spring 2026 State of Open Source report showing China surpassing the US in number of LLMs for the first time, with Chinese models taking 41% of all downloads. Alibaba's Qwen family crossed 1 billion total downloads (about 1 million per day), overtaking Llama as the most downloaded model family, on a platform now hosting 11M users and 2M+ models.

NVIDIA
Also Released

GR LPX (Rubin NVL72 + Groq 3)

NVIDIA GTC: GR LPX pairs Rubin NVL72 servers with the new Groq 3 chip

NVIDIA's GTC hardware reveal integrates the new Groq 3 chip (gen 2 was never publicly seen) into Rubin NVL72 servers via the GR LPX system. Claims include 3x tokens-per-watt efficiency at baseline, up to 30x at higher throughput, and 1000+ tokens/sec on a 2T-parameter frontier model with 400K context — performance the current Blackwell generation can't reach at any price.

Eon Systems
Also Released

Fruit Fly Brain Connectome Simulation

Eon Systems uploads full fruit fly brain connectome into simulation

Eon Systems uploaded the complete fruit fly brain connectome — 140,000 neurons and 50M+ synapses — into a MuJoCo physics simulator, achieving 91% behavioral accuracy. Notably no ML or LLMs were used: it is pure connectome simulation. The advisory board includes George Church, Stephen Wolfram, and Anders Sandberg, marking a milestone for whole-brain emulation.

140,000 Neurons in the uploaded fruit fly brain connectome50M+ Synapses in the fruit fly brain connectome91% Behavioral accuracy of the simulated fruit fly brain