Everything AI Released in May 2026

44 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

🧠 New Models 19

Anthropic
New Models

Claude Opus 4.8

Anthropic ships Claude Opus 4.8 live mid-show

Anthropic released Claude Opus 4.8 during the episode, hitting 69.2% on SWE-bench Pro (up from 64.3% on 4.7 and ahead of GPT-5.5 at 58.6%), a new-best 57.9% on Humanity's Last Exam with tools, and 83.4% on OSWorld-Verified. It also shows a real long-context jump past the usual 200K cliff (85.9% GraphWalks BFS at 256K), with new thinking modes in the UI. Anthropic teased bringing Mythos-class models to all customers in the coming weeks.

69.2% SWE-bench Pro
Cartesia
New Models

Ink-2

Cartesia Ink-2 tops Artificial Analysis's new STT leaderboard

Cartesia released Ink-2, which debuted as the most accurate streaming speech-to-text model with the fastest turnaround on Artificial Analysis's new STT leaderboard. It landed just after recording as part of a double post-show voice-AI drop alongside ElevenLabs Dubbing v2.

ElevenLabs
New Models

Dubbing v2

ElevenLabs Dubbing v2 preserves your performance across 90+ languages

ElevenLabs launched Dubbing v2, an audio-to-audio dubbing model that translates voices across more than 90 languages while preserving cadence, expression, intonation, and even stutters. Alex's live demos, including dubbing Nisten into Hebrew and his own voice into multiple languages, were the brain-melting moment of the episode.

Microsoft
New Models

MAI-Image-2.5

Microsoft MAI-Image-2.5 jumps to #3 on Arena text-to-image

MAI-Image-2.5 jumped to number two on Arena's image-to-image leaderboard shortly after launch, with notable strength in image cleanup, backgrounds, documents, and diagrams. Hands-on tests on the show were mixed, and it is publicly accessible through playground.microsoft.ai.

OpenBMB
New ModelsOpen weights

MiniCPM5-1B

OpenBMB MiniCPM5-1B: new SOTA 1B open-weights model

OpenBMB released MiniCPM5-1B, a state-of-the-art 1B-parameter open-weights model for efficient local and on-device use that runs on a phone. It scores 17.9 on the Artificial Analysis Intelligence Index, 7.4 points ahead of its size class, while using roughly 31x fewer output tokens than Qwen3.5 2B.

17.9 AAII (1B model)
PrismML
New ModelsOpen weights

Bonsai Image 4B

PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB

PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.

Tencent
New ModelsOpen weights

Hy-MT2

Tencent open-sources Hy-MT2 translation models under Apache 2.0

Tencent released the Hy-MT2 family of translation models under Apache 2.0, including a tiny 1.8B model that beats paid translation APIs like Microsoft's Translator, plus a larger 30B-A3B MoE variant. A small, free, locally-runnable model outperforming commercial translation services was one of the open-source wins of the week.

Cohere
New ModelsOpen weights

Command A+

Cohere releases Command A+, a 218B Apache 2.0 MoE with 25B active params

Cohere released Command A+, a 218B-parameter mixture-of-experts model with 25B active parameters, shipping open weights under Apache 2.0. It was the week's headline open-source release, available on Hugging Face in both W4A4 quantized and BF16 variants.

218B Command A+ parameters25B active parameters
Cursor
New Models

Composer 2.5

Cursor launches Composer 2.5 with Opus-class coding at much lower cost

Cursor launched Composer 2.5, a coding model continued-trained on top of Kimi K2.5 (with permission) that delivers Opus-class coding performance at much lower cost. The crew noted Cursor is 'absolutely back' with strong pre-training and post-training teams, and that training now runs partly on the Colossus supercomputer.

Google DeepMind
New Models

Gemini 3.5 Flash

Gemini 3.5 Flash launches at I/O as Google's agentic workhorse model

Google launched Gemini 3.5 Flash at I/O 2026 as a fast, determined workhorse model built for agentic loops rather than a budget-tier Flash like prior generations. It is rolling out across the Gemini app, Search AI Mode, the Gemini API, Google AI Studio, Antigravity and the Gemini Enterprise Agent Platform. Nisten noted unusual determinism in its behavior, and Logan Kilpatrick framed it as designed for the agentic era.

900M Gemini app users
Google DeepMind
New Models

Gemini Omni

Gemini Omni: 'create anything from anything' conversational video editor

Google DeepMind launched Gemini Omni, a multimodal 'create anything from anything' model debuting as Google's first conversational video editor. Unlike pure text-to-video systems, Omni is an iterative multi-turn editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media, in the same way Nano Banana brought Gemini to interactive image editing. It is available in the Gemini app, Google Flow and YouTube, with API support coming soon.

Fastino Labs
New ModelsOpen weights

GLiGuard

Fastino Labs GLiGuard: 300M open guardrail model matches SOTA safety models

Fastino Labs released GLiGuard, a 300M-parameter open source guardrail model that matches state-of-the-art safety models 23-90x its size while delivering 16x higher throughput. It ships under Apache 2.0, making small, fast, deployable guardrails available to everyone.

300M parameters
Krea AI
New Models

Krea 2

Krea 2: Krea's first from-scratch foundation image model

Krea released Krea 2, its first foundation image model trained from scratch, built over six to seven months by nearly half the company. It focuses on aesthetic diversity, style control with up to 4 reference images, and moodboard-driven workflows, generating images in roughly 15 seconds. Co-founder and CEO Victor Perez joined the show to walk through it.

Thinking Machines Lab
New Models

Interaction Models

Thinking Machines Lab drops Interaction Models: real-time multimodal 276B MoE

Mira Murati's Thinking Machines Lab released Interaction Models, a 276B-parameter MoE (12B active) trained from scratch for native real-time multimodal collaboration. It supports full-duplex audio/video/text with 0.40s turn-taking latency and scores 77.8 on FD-bench v1.5. The demo can react live to events like another person entering the camera frame.

276B MoE parameters12B active parameters

🚀 Products & Apps 6

Google
Products & Apps

Universal Cart / AP2 / UCP

Google launches Universal Cart, AP2 and UCP for agentic commerce

Google launched Universal Cart along with the AP2 and UCP protocols, infrastructure that lets AI agents shop and pay on a user's behalf. It is Google's play to standardize agent-driven commerce across merchants and payment flows.

Google
Products & Apps

Antigravity 2.0

Antigravity 2.0 becomes Google's central agentic coding harness

Antigravity 2.0 was positioned at I/O 2026 as the single agent harness powering agentic experiences across Google, from internal tooling to Search, Workspace and developer products. Born from the Windsurf acquisition, it evolved from an agent-first IDE into the through line for Google's agentic strategy, now exposed to external developers as well.

Google
Products & Apps

Gemini Spark

Gemini Spark announced as a 24/7 proactive personal AI agent

Google announced Gemini Spark, a 24/7 personal AI agent that can proactively work across Google surfaces, framed on the show as Google's OpenClaw competitor. Access was not yet broadly available at announcement time, so the crew discussed it from the announcement rather than hands-on testing.

OpenAI
Products & Apps

Daybreak

OpenAI launches Daybreak, a frontier AI cybersecurity platform

OpenAI announced Daybreak, a frontier AI cybersecurity platform that pairs GPT-5.5 with Codex for security workloads. It launches with partners including Cloudflare, positioning OpenAI directly in the AI-powered defense market.

✨ Major Features & Updates 8

Anthropic
Major Features & Updates

Dynamic Workflows in Claude Code

Dynamic Workflows and Ultra Code land in Claude Code

Alongside Opus 4.8, Anthropic shipped Dynamic Workflows and an Ultra Code mode in Claude Code, which Yam fired up live on the show. The headline proof point: Bun was ported from Zig to Rust — about 750K lines — via Dynamic Workflows, with 99.8% of the test suite passing and the port merged in 11 days.

750K lines Bun: Zig → Rust
Anthropic
Major Features & Updates

Claude off-peak usage boost

Anthropic doubles Claude usage limits outside peak hours for a limited time

Anthropic doubled Claude usage outside peak hours for a limited period, covering Claude Code and other Claude surfaces. The move gives heavy users substantially more agentic and coding throughput during off-peak windows.

Google
Major Features & Updates

Google Search agentic capabilities

Google Search adds Gemini 3.5 Flash-powered agentic capabilities

Google Search is getting new Gemini 3.5 Flash-powered agentic capabilities, including a new AI-powered Search box and background information agents. The crew framed the rollout as a massive intelligence uplift across one of Google's largest surfaces, with billions of Search users getting frontier-model capabilities.

3.5B Google Search users
OpenAI
Major Features & Updates

Codex Mobile

OpenAI Codex Mobile arrives in the ChatGPT mobile apps

OpenAI's Codex Mobile is now available in the ChatGPT mobile apps, enabling remote agent workflows from a phone. The crew discussed it as part of the broader shift toward driving coding agents from anywhere rather than just the desktop.

Anthropic
Major Features & Updates

Claude Agent SDK monthly credits

Anthropic adds separate Claude Agent SDK credits to paid plans

Anthropic announced separate monthly Claude Agent SDK credits for Pro, Max, Team, and Enterprise subscribers, starting June 15, 2026. This gives agent builders a dedicated usage pool on top of regular plan limits.

Meta AI
Major Features & Updates

Muse Spark voice conversations

Meta launches Muse Spark voice conversations across its apps and glasses

Meta rolled out Muse Spark-powered voice conversations across the Meta AI app, WhatsApp, Instagram, Facebook, and Ray-Ban Meta glasses. The feature includes real-time image generation, live camera AI, and instant Reels/maps integration. Alex tested it live and called it surprisingly good, the first big consumer ship from Meta Superintelligence Labs.

Major Features & Updates

/goal command

/goal command lands in Codex, Claude Code, and Hermes - the productized Ralph

The /goal command is now available in Codex, Claude Code, and Hermes, productizing the Ralph loop pattern: set a measurable success condition and the agent iterates autonomously until it is done. Codex's implementation is winning early head-to-head comparisons over Claude Code, and the show framed it as turning coding agents into 24/7 AI employees.

🔌 APIs & Platforms 1

Google DeepMind
APIs & Platforms

Managed Agents (Gemini API)

Gemini API gets Managed Agents with hosted sandboxes and the Interactions API

Google launched Managed Agents in the Gemini API, letting developers spin up hosted Antigravity agents with Linux sandboxes and persistent state. It ships alongside the next-generation Interactions API, which Logan Kilpatrick described as designed for agentic systems rather than the old tokens-in, tokens-out model interaction pattern.

🛠️ Dev Tools 4

Weights & Biases
Dev Tools

W&B MCP Server

Weights & Biases launches MCP server with 20 tools for agents

W&B officially launched its MCP server with 20 schema-first tools so coding agents can read experiments, monitor training, and run autonomous research loops. Agents can query metadata before pulling full 300-metric runs, keeping their context windows from blowing up.

xAI
Dev Tools

Grok Build

xAI launches Grok Build, an agentic CLI coding tool in beta

xAI launched Grok Build, an agentic CLI coding tool, in beta for SuperGrok Heavy subscribers. It joins the crowded field of terminal-based coding agents as xAI's entry into agentic engineering tooling.

Nous Research
Dev Tools

Hermes CLI agent

Hermes passes OpenClaw as #1 CLI agent on OpenRouter, adds computer use

Nous Research's Hermes overtook OpenClaw as the #1 CLI agent on OpenRouter. It also added background computer use via Trykua, and Alex described switching his own daily agent workflow from OpenClaw to Hermes.

📄 Papers & Research 3

OpenAI
Papers & Research

Erdős planar unit distance result

OpenAI model makes progress on 80-year-old Erdős planar unit distance problem

OpenAI announced that a general-purpose reasoning model made progress on the Erdős planar unit distance problem, challenging an 80-year-old mathematical belief. The panel called it the most important news of the week outside Google I/O, as a sign that frontier reasoning models are starting to contribute to genuinely open mathematics.

80-year Erdos math problem
Nous Research
Papers & ResearchOpen weights

TST (Token Superposition Training)

Nous Research TST: 2-3x training speedup without architecture changes

Nous Research released Token Superposition Training (TST), a training technique that achieves 2-3x wall-clock speedup at matched FLOPs. It requires no architecture changes, making it a drop-in efficiency win for LLM training runs.

📊 Benchmarks & Evals 2

Datacurve
Benchmarks & EvalsOpen weights

DeepSWE

Datacurve's DeepSWE: a contamination-free coding benchmark

DeepSWE is a coding leaderboard built from 113 original tasks written from scratch and shipped as shallow clones with no git history to cheat from. GPT-5.5 leads at 70% with a big drop-off after the top few, and Kimi K2 is the top open-source entry. Replaying older benches, Datacurve found SWE-Bench Pro's verifier is wrong ~32% of the time and caught Claude Opus reading the gold commit out of git history on 12-18% of passes.

70% DeepSWE leader (GPT-5.5)
Artificial Analysis
Benchmarks & Evals

Coding Agent Index

Artificial Analysis Coding Agent Index benchmarks model + harness combos

Artificial Analysis launched the Coding Agent Index, a benchmark that evaluates model and harness combinations rather than models alone. Opus 4.7 in Cursor CLI leads at 61, GLM-5.1 tops the open-weight entries at 53, and costs vary 30x across combos for similar capability.

🌀 Also Released 1

Anthropic
Also Released

Colossus compute deal

SpaceX IPO filing reveals Anthropic pays $1.25B/month for Colossus compute

The SpaceX IPO filing revealed Anthropic is paying $1.25 billion per month for AI compute at the Memphis Colossus facility. The crew called it a bombastic deal that lets Anthropic serve far more inference at scale and feel less compute-constrained.

$1.25B monthly AI compute spend