Everything AI Released in March 2026

59 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

← February 2026 All months April 2026 →

🧠 New Models 28

A Aratako Mar 26, 2026

New ModelsOpen weights

Irodori-TTS-500M

Irodori-TTS-500M: open Japanese TTS with emoji emotion control

Irodori-TTS-500M is a 500M-parameter open-weights Japanese text-to-speech model released on Hugging Face, notable for controlling emotional delivery through emojis in the input text. It landed as part of the week's wave of voice and audio releases.

Announcement (X) ↗Irodori-TTS-500M on Hugging Face ↗

🎙️ Hear our coverage →

#voice-ai #open-source

Cohere Mar 26, 2026

New ModelsOpen weights

Cohere Transcribe

Cohere Transcribe: open-source 2B ASR tops Open ASR Leaderboard at 5.42% WER

Cohere entered the ASR game with Transcribe, a 2-billion-parameter Apache 2.0 speech recognition model that immediately took the number-one spot on Hugging Face's Open ASR Leaderboard with a 5.42% word error rate versus Whisper Large v3's 7.44%. It wins 61% of human evaluations on average and 64% head-to-head against Whisper, making it a credible local-inference Whisper replacement for regulated industries.

2B Cohere Transcribe ASR size5.42% Word error rate on Open ASR Leaderboard

Cohere announcement (X) ↗Cohere blog: Transcribe ↗Open ASR Leaderboard (Hugging Face) ↗

🎙️ Hear our coverage →

#voice-ai #open-source

Google DeepMind Mar 26, 2026

New Models

Gemini 3.1 Flash Live

Google drops Gemini 3.1 Flash Live: Gemini can see, hear, and talk to you

Google released Gemini 3.1 Flash Live, a realtime multimodal model that handles voice and vision interaction in a single model path instead of stitched pipelines. The panel framed it as a major upgrade for end-to-end voice and vision agents, with AI Studio and API availability as the immediate way to experiment.

Google DeepMind announcement (X) ↗

🎙️ Hear our coverage →

#voice-ai #agents

Google DeepMind Mar 26, 2026

New Models

Lyria 3 Pro

Google Lyria 3 Pro generates full 3-minute music tracks with structural control

Google DeepMind released Lyria 3 Pro, its most advanced music model, generating full 3-minute tracks with structural control over intros, verses, choruses, and bridges, and even composing music from images. The crew generated a drum-and-bass ThursdAI opener live with spot-on instruction following; output is SynthID watermarked and royalty-free, available to Gemini subscribers and via Producer AI.

Google announcement (X) ↗Lyria models page (Google DeepMind) ↗

🎙️ Hear our coverage →

Luma AI Mar 26, 2026

New Models

Uni-1

Luma Labs Uni-1 thinks and generates pixels simultaneously, #1 preference Elo

Luma Labs released Uni-1, an LLM-based image model that thinks and generates pixels simultaneously and claims the number-one human preference Elo. Unlike traditional diffusion workflows you converse with it and iterate together toward results, and it can also generate infographics; a surprising pivot from Luma's video focus.

Luma Labs announcement (X) ↗Uni-1 announcement page ↗Try Uni-1 in the Luma app ↗

🎙️ Hear our coverage →

#image-gen #multimodal

MiniMax Mar 26, 2026

New ModelsOpen weights

MiniMax 2.7

MiniMax 2.7 open-source weights discussed as small-model momentum continues

The panel covered MiniMax 2.7 and its open-weights release in the context of small, efficient models becoming genuinely practical for local and specialized agent workflows. The segment focused on capability momentum and how open-weights expectations keep shaping adoption sentiment.

🎙️ Hear our coverage →

#open-source #agents

Mistral AI Mar 26, 2026

New ModelsOpen weights

Voxtral TTS

Mistral drops Voxtral TTS, a 3B open-weight text-to-speech model

Mistral released Voxtral TTS, its first text-to-speech model, as breaking news during the live show: 3 billion parameters, open weights, with emotion controls for neutral, happy, and frustrated voices. Mistral claims it beats ElevenLabs Flash v2.5 in human preference tests with a 58% win rate on flagship voices and 68% on zero-shot voice cloning, though Alex's live test found it decent rather than stunning.

3B Mistral Voxtral TTS size

Mistral AI announcement (X) ↗Mistral blog: Voxtral TTS ↗

🎙️ Hear our coverage →

#voice-ai #open-source

Reka AI Mar 26, 2026

New ModelsOpen weights

Reka Edge

Reka AI ships Edge, a 7B multimodal VLM for sub-second on-device inference

Reka AI launched Reka Edge, a 7B-parameter multimodal vision-language model built for sub-second latency on edge devices. Weights are on Hugging Face and the model is available through OpenRouter, with the panel highlighting it as a notable efficient multimodal release for real-world deployment.

Reka AI announcement (X) ↗Reka Edge on Hugging Face ↗Reka Edge on OpenRouter ↗Reka AI blog ↗

🎙️ Hear our coverage →

#open-source #vision #on-device

Cursor Mar 19, 2026

New Models

Composer 2

Cursor Composer 2 beats Opus 4.6 on TerminalBench at a tenth of the price

Cursor launched Composer 2, its first proprietary model that genuinely competes with frontier labs. It scores 61% on TerminalBench (beating Opus 4.6) at $0.50/M input tokens, cheaper than GPT-5.4 Mini and 10x cheaper than Opus, running at 300+ tokens/sec. A fast variant costs 3x more for the same intelligence, kicking off a new 'fast mode' pricing trend where you pay a premium for speed rather than capability.

Cursor blog ↗X announcement ↗Cursor announcement (X) ↗Composer 2 tech report (PDF) ↗

🎙️ Hear our coverage (+1 follow-up) →

#coding #agents

H Company Mar 19, 2026

New ModelsOpen weights

Holotron-12B

H Company's Holotron-12B: hybrid SSM computer-use model at 8.9k tok/s

H Company released Holotron-12B, an open-source hybrid SSM model built for computer-use agents. It claims 8,900 tokens/sec generation speed and jumps the WebVoyager benchmark from 35.1% to 80.5%, continuing the trend of hybrid SSM architectures for long-context agent workloads.

8,900 tok/s H Company Holotron 12B

Hugging Face ↗H Company blog ↗H Company on X ↗BricksAI on X ↗

🎙️ Hear our coverage →

#open-source #agents

MiniMax Mar 19, 2026

New Models

MiniMax M2.7

MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro

MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.

56% MiniMax 2.7 SWE-bench Pro

MiniMax announcement ↗MiniMax on X ↗TestingCatalog on X ↗MiniMax M2.7 announcement (X) ↗

🎙️ Hear our coverage (+1 follow-up) →

#coding #agents #reasoning

Mistral AI Mar 19, 2026

New ModelsOpen weights

Mistral Small 4

Mistral Small 4: 119B MoE with 6B active unifies vision, coding, reasoning

Mistral returned to open source with Small 4, a 119B-parameter MoE with 128 experts and only 6B active per token, released under Apache 2.0. It unifies the previous Pixtral (vision), Devstral (coding), and Magistral (reasoning) lines into one model and can fit on a single H100 when compressed. Early WolfBench results are sobering at ~17% on OpenClaw agent tasks, roughly on par with similarly sized Nemotron.

119B Mistral Small 4 total params

Mistral blog ↗Hugging Face ↗X announcement ↗

🎙️ Hear our coverage →

#open-source #architecture #multimodal

OpenAI Mar 19, 2026

New Models

GPT-5.4 Mini & Nano

OpenAI ships GPT-5.4 Mini and Nano for coding, computer use, and subagents

OpenAI released GPT-5.4 Mini ($0.75/M input) and Nano, smaller variants optimized for coding and computer use at a fraction of flagship cost. Mini hits 72% on OS World verified, matching the human baseline and nearly reaching full 5.4's 75%, while beating Sonnet 4.5 on most benchmarks. They are designed as cheap parallel subagent workers under a GPT-5.4 orchestrator in Codex, and Mini is 2x faster than the previous GPT-5 Mini.

X announcement ↗GPT-5.4 Mini docs ↗API pricing ↗

🎙️ Hear our coverage →

#coding #agents

Xiaomi Mar 19, 2026

New Models

MiMo

Xiaomi MiMo revealed as the 1T-param stealth model topping OpenRouter

Xiaomi revealed MiMo, a 1-trillion-parameter family with omni-modal and language-only variants, unmasked as the stealth model that had been sitting at #1 on OpenRouter. The reveal surprised the panel, marking Xiaomi's entry into the frontier-model conversation.

Luo Fuli on X ↗

🎙️ Hear our coverage →

#frontier-models #multimodal

Fish Audio Mar 13, 2026

New ModelsOpen weights

Fish Audio S2

Fish Audio S2 open TTS hits sub-150ms latency

Fish Audio S2 is a fully open-source TTS model with inline emotion control via free-text bracket tags like gasp, laughter, and long pause. Alex demoed it live with an OpenClaw skill that let his 5-year-old talk to a voice clone of 'Rocky' from Project Hail Mary; Wolfram called it 'ElevenLabs V3 for free.'

<150ms Fish Audio S2 TTS latency

Fish Audio S2 on X ↗Fish Speech 2 on HuggingFace ↗fish.audio ↗

🎙️ Hear our coverage (+1 follow-up) →

#voice-ai #open-source

Google Mar 13, 2026

New Models

Gemini Embedding 2

Google launches Gemini Embedding 2, a natively multimodal embedder

Google launched Gemini Embedding 2, a natively multimodal embedding model that supports text, image, video, and audio in a single unified embedding space. It is available through the Gemini Embeddings API.

Gemini Embedding 2 on X ↗Gemini Embeddings API docs ↗

🎙️ Hear our coverage →

#search #multimodal

Lightricks Mar 13, 2026

New ModelsOpen weights

LTX Video 2.3

Lightricks ships open-source LTX Video 2.3, runs on an RTX 3090

Lightricks released LTX Video 2.3, an open-source video generation model with improved motion, audio, and quality that runs on a single RTX 3090. It is available on GitHub and Hugging Face.

LTX-Video on GitHub ↗LTX-Video on HuggingFace ↗

🎙️ Hear our coverage →

#video-gen #open-source

MiroMind Mar 13, 2026

New ModelsOpen weights

MiroThinker-1.7

MiroThinker-1.7 open-source research agent hits SOTA

MiroMind released MiroThinker-1.7, an open-source deep-research agent model that reaches state of the art on deep research benchmarks. It was covered alongside NVIDIA's Nemotron launch in the open-source segment.

MiroThinker-1.7 on X ↗MiroThinker-1.7 on HuggingFace ↗

🎙️ Hear our coverage →

#agents #open-source #research

M Mixbread Mar 13, 2026

New Models

embed-large-v3

Mixbread embed-large-v3 beats Gemini Embedding 2

mixbread.ai dropped embed-large-v3, an embedding model that beats Gemini Embedding 2 on nearly every benchmark, including a jaw-dropping 98% vs 6.9% on structured-data tasks. Benjamin Clavie announced it live during the show.

98% Mixbread embed-large-v3 structured data benchmark score (vs 6.9% for Gemini)

Benjamin Clavie on X ↗

🎙️ Hear our coverage →

NVIDIA Mar 13, 2026

New ModelsOpen weights

Nemotron 3 Super 120B

NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet

NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.

120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)

NVIDIA on X ↗Nemotron 3 Super blog post ↗Nemotron 3 Super on HuggingFace ↗W&B Inference (Nemotron) ↗

🎙️ Hear our coverage →

#open-source #architecture #reasoning

T Templar Mar 13, 2026

New ModelsOpen weights

Covenant-72B

Covenant-72B: a decentralized-trained open 72B LLM

Covenant-72B is a decentralized 72B-parameter open LLM, released and shared via Hugging Face. It was highlighted in the open-source segment as an example of decentralized model training.

Covenant-72B on X ↗Covenant-72B on HuggingFace ↗

🎙️ Hear our coverage →

#open-source #training

Alibaba (Qwen) Mar 5, 2026

New ModelsOpen weights

Qwen3.5 Small Series

Alibaba releases Qwen3.5 small models (2B, 4B, 9B) for local use

Alibaba released the Qwen3.5 small model series with 2B, 4B, and 9B variants, which the panel found highly usable on consumer hardware. The release landed alongside leadership turbulence as Junyang Lin and Binyuan Hui departed Qwen, though the panel expects Alibaba's open-source momentum to continue.

Qwen3.5 small models announcement ↗Qwen3.5-9B on Hugging Face ↗Qwen3.5-4B on Hugging Face ↗Qwen3.5-2B on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #on-device

Cognition Mar 5, 2026

New Models

SWE-1.6

Cognition previews SWE-1.6, hitting 51% on SWE Bench Pro

Cognition previewed SWE-1.6, the next iteration of its software-engineering model line, citing 51% on SWE Bench Pro. It was covered in the TL;DR tools segment as part of the week's agentic coding model releases.

51% SWE Bench Pro (SWE 1.6)

Cognition SWE-1.6 announcement ↗SWE-1.6 preview blog post ↗

🎙️ Hear our coverage →

#coding #agents

Google DeepMind Mar 5, 2026

New Models

Gemini 3.1 Flash-Lite

Google launches Gemini 3.1 Flash-Lite with 1M context at 360 tok/s

Google launched Gemini 3.1 Flash-Lite, a fast and cheap model with 1M token context aimed at the instant/fast tier, running around 360 tokens per second. The panel flagged a material pricing jump versus the prior Flash-Lite generation but saw it as well suited for judge, guardrail, and orchestration workloads in agent systems.

360 tokens/sec Gemini 3.1 Flash-Lite speed

Logan Kilpatrick announcement ↗Gemini Flash-Lite page ↗

🎙️ Hear our coverage →

#frontier-models #architecture #infrastructure

I IEIT (Yuan AI Lab) Mar 5, 2026

New ModelsOpen weights

Yuan 3.0 Ultra

Yuan AI Lab releases Yuan 3.0 Ultra open-weights model

Yuan AI Lab (IEIT) released Yuan 3.0 Ultra, a new open-weights model published on Hugging Face under the IEITYuan org. It was covered in the open-source LLM roundup as part of a busy week for Chinese open model releases.

Yuan 3.0 Ultra announcement ↗Yuan Lab blog ↗IEITYuan on Hugging Face ↗

🎙️ Hear our coverage →

OpenAI Mar 5, 2026

New Models

GPT-5.3 Instant

OpenAI rolls out GPT-5.3 Instant as the free-tier fast model

OpenAI rolled out GPT-5.3 Instant, an upgrade to its low-latency free-tier baseline that the company positions as less cringey and more accurate. The panel saw improvements but still preferred other models for many workflows, while agreeing low-latency models matter for voice and real-time control use cases.

OpenAI GPT-5.3 Instant announcement ↗

🎙️ Hear our coverage →

#frontier-models #consumer-ai

OpenAI Mar 5, 2026

New Models

GPT-5.4

OpenAI drops GPT-5.4 Thinking and GPT-5.4 Pro live during the show

OpenAI released GPT-5.4 Thinking and GPT-5.4 Pro mid-show, a frontier general model that folds Codex-level coding into a unified reasoning model. It ships with a 1M token context window, a /fast mode, and mid-reasoning steering, posting 83.3% on ARC-AGI 2 (Pro) and roughly 75% on OS World computer use. The panel tested it live in Codex and called it a major general-model jump, while noting input pricing rose about 50% versus 5.2.

83.3% ARC-AGI 2 (GPT-5.4 Pro)75% OS World / computer-use score1M Context window

OpenAI GPT-5.4 announcement ↗ARC Prize on GPT-5.4 ↗Alex Volkov's live reaction thread ↗Benchmark breakdown by @nasqret ↗

🎙️ Hear our coverage →

#frontier-models #reasoning #coding

StepFun Mar 5, 2026

New ModelsOpen weights

Step 3.5 Flash Base

StepFun open-sources Step 3.5 Flash Base with its training stack

StepFun released Step 3.5 Flash Base and Midtrain checkpoints, an unusually open release that includes training artifacts and the SteptronOSS training stack alongside the weights. The panel praised the Apache-2 orientation and called the continuation-pretraining flexibility a major practical unlock for builders.

StepFun announcement ↗Step-3.5-Flash-Base on Hugging Face ↗SteptronOSS training stack on GitHub ↗Step 3.5 Flash paper on arXiv ↗

🎙️ Hear our coverage →

#open-source #research

🚀 Products & Apps 5

Modular Mar 26, 2026

Products & Apps

Modular 26.2

Modular 26.2 runs FLUX.2 in under a second, 99% cheaper than Nano Banana

Modular shipped its 26.2 release with state-of-the-art image generation, running FLUX.2 in under one second (sub-300ms claims) at 99% lower cost than Nano Banana, plus upgraded AI coding with Mojo. Alex noted the surprise of an inference platform releasing model-level optimization and hoped the approach spreads to all image generation.

Modular announcement (X) ↗Modular 26.2 blog post ↗Modular FLUX.2 speed demo (X) ↗

🎙️ Hear our coverage →

#image-gen #infrastructure #coding

P Phota Labs Mar 26, 2026

Products & Apps

Phota Studio + API

Phota Labs launches Phota Studio + API with identity-preserving personalization

Phota Labs launched Phota Studio and an API around a photography-focused image model with identity-preserving personalization: upload a batch of your photos, it trains a personal model, and the generated images actually resemble you. Alex flagged the personalization as a real capability jump over the crowd of photo startups, for professional shots, photo fixes, and adding people to photos.

Phota Labs announcement (X) ↗Try Phota Studio ↗

🎙️ Hear our coverage →

#image-gen #consumer-ai

Manus (Meta) Mar 19, 2026

Products & Apps

Manus My Computer

Manus launches 'My Computer' desktop app for macOS and Windows

Manus, now Meta-owned, launched 'My Computer', a desktop app that brings its AI agent from the cloud onto your local machine for macOS and Windows. The agent can now operate directly on local files and applications rather than running only in a hosted sandbox.

Manus on X ↗Manus blog ↗

🎙️ Hear our coverage →

NVIDIA Mar 19, 2026

Products & Apps

NemoClaw

NVIDIA announces NemoClaw, enterprise-hardened OpenClaw, at GTC

At GTC, Jensen Huang spent 15 minutes on OpenClaw, calling it the most important open source release since Linux and declaring 'every company needs an OpenClaw strategy.' NVIDIA released NemoClaw, a hardened enterprise reference implementation of OpenClaw with a privacy router and policy engine aimed at solving the agent security problem.

NemoClaw site ↗NVIDIA NemoClaw page ↗TechCrunch coverage ↗Alex Volkov on X ↗

🎙️ Hear our coverage →

#agents #industry #safety

Weights & Biases Mar 19, 2026

Products & Apps

W&B iOS App

Weights & Biases launches native iOS app for monitoring training runs

W&B shipped its most-requested feature ever: a native iOS app for monitoring AI training runs with live metrics and push notifications for crash alerts. Practitioners can now keep an eye on long-running training jobs from their phone instead of staying glued to a dashboard.

W&B on X ↗App Store ↗W&B site ↗

🎙️ Hear our coverage →

#coding #infrastructure

✨ Major Features & Updates 7

Anthropic Mar 26, 2026

Major Features & Updates

Claude computer use (Cowork + Claude Code)

Claude can now control your Mac: computer use lands in Cowork and Claude Code

Anthropic shipped computer use as a research preview in Claude Cowork and Claude Code, letting Claude directly control local Mac workflows. The panel compared it to existing OpenClaw-style agent patterns and debated where direct UI control is genuinely useful versus overkill.

Claude announcement (X) ↗Claude Cowork product page ↗

🎙️ Hear our coverage →

Anthropic Mar 19, 2026

Major Features & Updates

Claude Opus 4.6 (1M context)

Anthropic makes Opus 4.6 1M context the default in Claude Code, same price

Anthropic made 1M token context the default for Opus 4.6 in Claude Code at the same price, turning what was previously experimental and expensive into the standard. MRCR benchmark performance holds at 93% at 256K and 76% at 1M. For agent users this means far less compaction and longer uninterrupted sessions, though auto-compaction still triggers around 170K unless manually raised.

1M Opus 4.6 context default

🎙️ Hear our coverage →

#architecture #agents #coding

Google Mar 19, 2026

Major Features & Updates

Google AI Studio (vibe coding overhaul)

Google AI Studio gets full-stack vibe coding with Antigravity and Firebase

Google AI Studio received a full-stack vibe coding overhaul featuring the Antigravity agent, Firebase integration, and multiplayer support. The update pushes AI Studio from a model playground toward a full app-building environment.

Logan Kilpatrick on X ↗Google blog ↗AI Studio ↗

🎙️ Hear our coverage →

#coding #agents

NVIDIA Mar 19, 2026

Major Features & Updates

DLSS 5

NVIDIA DLSS 5 adds a generative AI filter for photo-realistic lighting

Announced at GTC, NVIDIA's DLSS 5 introduces a new generative AI filter bringing photo-realistic lighting to RTX 50-series GPUs. It applies generative models to real-time game rendering, extending DLSS beyond upscaling and frame generation.

Digital Foundry coverage ↗

🎙️ Hear our coverage →

#image-gen #world-models #infrastructure

OpenAI Mar 19, 2026

Major Features & Updates

Codex Subagents

OpenAI ships subagents for Codex with custom TOML configs

OpenAI added subagents to Codex, enabling parallel specialized agents configured via custom TOML files. Paired with the cheap GPT-5.4 Mini and Nano models, this enables the orchestrator-plus-workers pattern where a flagship model spawns inexpensive parallel subagents for tasks like visual testing.

Codex subagents docs ↗OpenAI Devs on X ↗Codex GitHub ↗

🎙️ Hear our coverage →

#agents #coding

Cursor Mar 13, 2026

Major Features & Updates

Cursor in JetBrains (ACP)

Cursor joins ACP registry and goes live in JetBrains IDEs

Cursor joined the Agent Communication Protocol (ACP) registry and is now live inside JetBrains IDEs. The move is a cross-ecosystem win for ACP, the emerging open standard that lets any AI agent plug into any editor.

JetBrains: Cursor joins ACP registry ↗Cursor blog: JetBrains ACP ↗

🎙️ Hear our coverage →

#agents #coding

OpenAI Mar 5, 2026

Major Features & Updates

Codex app for Windows

OpenAI launches the Codex app on Windows

OpenAI brought its Codex desktop app to Windows, expanding the agentic coding tool beyond its initial platforms. Mentioned in the TL;DR tools and agentic engineering rundown.

Codex on Windows announcement ↗

🎙️ Hear our coverage →

🔌 APIs & Platforms 1

xAI Mar 19, 2026

APIs & Platforms

Grok Text-to-Speech API

xAI launches Grok TTS API with 5 voices and WebSocket streaming

xAI launched a Grok Text-to-Speech API with five voices, expressive controls, and WebSocket streaming, priced cheaper than ElevenLabs. It adds another option to a suddenly competitive voice AI market alongside open-source entrants like Fish Audio S2.

xAI on X ↗Grok voice API ↗Try text-to-speech ↗

🎙️ Hear our coverage →

🛠️ Dev Tools 7

Unsloth AI Mar 19, 2026

Dev ToolsOpen weights

Unsloth Studio

Unsloth Studio: web UI for local fine-tuning with 2x speed, 70% less VRAM

Unsloth launched Studio, an open-source web UI for local LLM training and inference claiming 2x speed and 70% less VRAM, supporting 500+ models across text, vision, audio, and embeddings. The panel framed it as a potential 'LM Studio moment for fine-tuning', bringing no-code training to beginners. Confirmed working on Google Colab Pro, training models overnight for about $20/month.

Unsloth Studio docs ↗X announcement ↗GitHub ↗Daniel Han announcement (X) ↗

🎙️ Hear our coverage (+1 follow-up) →

#training #open-source #coding

Andrej Karpathy Mar 13, 2026

Dev ToolsOpen weights

AutoResearcher

Karpathy open-sources AutoResearcher for autonomous ML experiments

Andrej Karpathy open-sourced AutoResearch, a framework that runs AI-driven ML experiments autonomously. Over two days it ran 700 experiments on nanochat GPT-2, stacked 20 improvements, and achieved an 11% training speedup. Tobi Lütke adapted it overnight for Shopify's Liquid templating engine for a 51% render-time improvement, and the repo hit 26K GitHub stars quickly.

700 AutoResearcher experiments run in 2 days (Karpathy)11% GPT-2 training speedup from stacked AutoResearcher improvements51% Shopify Liquid render time improvement using AutoResearcher

Karpathy on X ↗autoresearch on GitHub ↗nanochat on GitHub ↗

🎙️ Hear our coverage →

#agents #search #coding

M Matt Van Horn Mar 13, 2026

Dev ToolsOpen weights

/last30days

/last30days research skill searches X, Reddit, YouTube and TikTok

Matt Van Horn presented /last30days, a research skill that searches X, Reddit, YouTube, and TikTok for the last 30 days of content on any topic. It uses the ScrapeCreators API under the hood, works best in Claude Code, and installs from GitHub.

/last30days on GitHub ↗@slashlast30days on X ↗

🎙️ Hear our coverage →

#research #coding #agents

P Paperclip Mar 13, 2026

Dev ToolsOpen weights

Paperclip.ing

Paperclip.ing: open-source agent orchestration for zero-human companies

Anonymous builder DOTTA presented Paperclip.ing, an open-source agent orchestration framework for 'zero human companies' where an AI CEO recursively hires more agents. It hit 20K GitHub stars in its first week, with a heartbeat system driving agent autonomy and a Memento-style memory architecture keeping agents coherent across tasks.

20K Paperclip GitHub stars in first week

Paperclip on GitHub ↗Paperclip.ing website ↗

🎙️ Hear our coverage →

#agents #open-source

Weights & Biases Mar 13, 2026

Dev ToolsOpen weights

W&B Agent Skills

Weights & Biases launches Agent Skills

Weights & Biases officially launched Agent Skills, installable via `npx skills add wandb/skills`. The launch coincided with Nemotron 3 Super becoming available on W&B Inference at $0.20/1M input tokens, one of the best price-performance options for a 120B model.

W&B Agent Skills on X ↗W&B Skills on GitHub ↗

🎙️ Hear our coverage →

#agents #coding

Google Mar 5, 2026

Dev Tools

Google Workspace CLI

Google releases a Google Workspace CLI

Google released a command-line interface for Google Workspace, making Workspace data and actions scriptable from the terminal for developers and agents. Covered briefly in the TL;DR tools segment.

Google Workspace CLI announcement ↗

🎙️ Hear our coverage →

OpenAI Mar 5, 2026

Dev ToolsOpen weights

Symphony

OpenAI releases Symphony on GitHub

Ryan Carson experimented with OpenAI's Symphony framework, letting agents work through PRs overnight. One agent not only created a PR but found a bug and filed its own detailed Jira ticket with no human intervention, a small but telling sign of where agentic development is heading.

Symphony on GitHub ↗

🎙️ Hear our coverage (+1 follow-up) →

#coding #agents

📄 Papers & Research 3

Google Research Mar 26, 2026

Papers & Research

TurboQuant

Google TurboQuant claims 6x KV-cache compression and 8x faster inference

Google Research published TurboQuant, a KV-cache quantization technique claiming 6x compression and 8x inference speedup with near-zero accuracy loss. The panel framed it as a potential unlock for LLM inference economics, while calling stock-market panic over the result premature without broader production validation.

6× TurboQuant KV-cache compression8× TurboQuant speedup claim

Google Research announcement (X) ↗Google Research blog: TurboQuant ↗TurboQuant paper (arXiv) ↗

🎙️ Hear our coverage →

#infrastructure

S State Spaces (Albert Gu et al.) Mar 19, 2026

Papers & ResearchOpen weights

Mamba-3

Mamba-3 lands with three SSM innovations for inference-first linear models

Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.

Arxiv paper ↗GitHub ↗Albert Gu on X ↗

🎙️ Hear our coverage →

#research #architecture #open-source

Black Forest Labs Mar 5, 2026

Papers & Research

Self-Flow

Black Forest Labs introduces Self-Flow

Black Forest Labs published Self-Flow, new research from the FLUX makers in the AI art and diffusion space. It was included in the week's AI Art & Diffusion roundup.

BFL Self-Flow announcement ↗Self-Flow research page ↗

🎙️ Hear our coverage →

#image-gen #architecture #research

📊 Benchmarks & Evals 4

ARC Prize Foundation Mar 26, 2026

Benchmarks & Evals

ARC-AGI-3

ARC-AGI-3 launches: humans score 100%, frontier models under 1%

ARC Prize launched ARC-AGI-3, an interactive agentic reasoning benchmark of turn-based puzzle games designed to test human-like generalization in novel abstract environments. Humans hit a 100% pass rate while top frontier models score under 1%, which the panel welcomed as a healthy reality check against AGI-is-here rhetoric and easy score inflation.

<1% ARC-AGI-3 frontier model scores100% Human completion on ARC-AGI-3

ARC Prize announcement (X) ↗ARC Prize site ↗

🎙️ Hear our coverage →

#benchmarks #reasoning #agents

M MarginLab Mar 5, 2026

Benchmarks & Evals

Claude Code tracker

MarginLab tracker shows degradation in Opus 4.6 on Claude Code

MarginLab's public Claude Code tracker surfaced measurable degradation in Opus 4.6 performance, discussed in the evals and benchmarks roundup. The tracker continuously evaluates Claude Code behavior over time, making silent model regressions visible.

MarginLab Claude Code tracker ↗

🎙️ Hear our coverage →

P Peter Gostev Mar 5, 2026

Benchmarks & Evals

BullShit Bench

Peter Gostev publishes BullShit Bench

Peter Gostev published BullShit Bench, a new community evaluation flagged in the week's evals and benchmarks roundup. It measures how models handle nonsense or unfounded claims rather than raw capability.

BullShit Bench announcement ↗

🎙️ Hear our coverage →

Weights & Biases Mar 5, 2026

Benchmarks & Evals

Wolf Bench

Wolfram previews Wolf Bench, a multi-metric agent eval from W&B

Wolfram Ravenwolf gave an early preview of Wolf Bench, a Terminal Bench-based evaluation framework from Weights & Biases that reports four metrics (average, best run, ceiling, and consistent floor) instead of a single score. It treats harness differences (Terminal Bench vs Claude Code vs OpenClaw) as a first-class factor and publishes benchmark cost and transparency details.

🎙️ Hear our coverage →

#benchmarks #agents

🤝 Acquisitions 1

OpenAI Mar 19, 2026

Acquisitions

Astral (uv, Ruff, ty)

OpenAI acquires Astral, makers of uv and Ruff, to join the Codex team

OpenAI acquired Astral, the company behind the uv Python package manager, Ruff, and ty, with the team joining Codex specifically — OpenAI's third acquisition of the month. The panel drew the parallel to Anthropic buying Bun for TypeScript infrastructure: OpenAI now owns core Python tooling for the code its agents write. The tools remain open source and forkable.

Astral blog ↗Astral on X ↗Charlie Marsh on X ↗

🎙️ Hear our coverage →

#industry #coding

🌀 Also Released 3

Hugging Face Mar 19, 2026

Also Released

State of Open Source Spring 2026 Report

Hugging Face report: China passes US in LLM count, Qwen tops 1B downloads

Hugging Face published its Spring 2026 State of Open Source report showing China surpassing the US in number of LLMs for the first time, with Chinese models taking 41% of all downloads. Alibaba's Qwen family crossed 1 billion total downloads (about 1 million per day), overtaking Llama as the most downloaded model family, on a platform now hosting 11M users and 2M+ models.

Hugging Face blog ↗Irene Solaiman on X ↗AeonCorridor on X ↗

🎙️ Hear our coverage →

#open-source #industry

NVIDIA Mar 19, 2026

Also Released

GR LPX (Rubin NVL72 + Groq 3)

NVIDIA GTC: GR LPX pairs Rubin NVL72 servers with the new Groq 3 chip

NVIDIA's GTC hardware reveal integrates the new Groq 3 chip (gen 2 was never publicly seen) into Rubin NVL72 servers via the GR LPX system. Claims include 3x tokens-per-watt efficiency at baseline, up to 30x at higher throughput, and 1000+ tokens/sec on a 2T-parameter frontier model with 400K context — performance the current Blackwell generation can't reach at any price.

🎙️ Hear our coverage →

#infrastructure

E Eon Systems Mar 13, 2026

Also Released

Fruit Fly Brain Connectome Simulation

Eon Systems uploads full fruit fly brain connectome into simulation

Eon Systems uploaded the complete fruit fly brain connectome — 140,000 neurons and 50M+ synapses — into a MuJoCo physics simulator, achieving 91% behavioral accuracy. Notably no ML or LLMs were used: it is pure connectome simulation. The advisory board includes George Church, Stephen Wolfram, and Anders Sandberg, marking a milestone for whole-brain emulation.

140,000 Neurons in the uploaded fruit fly brain connectome50M+ Synapses in the fruit fly brain connectome91% Behavioral accuracy of the simulated fruit fly brain

Eon Systems on X ↗eon.systems ↗FlyWire connectome data ↗

🎙️ Hear our coverage →

← February 2026 All months April 2026 →