Everything AI Released in May 2026

44 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

← April 2026 All months June 2026 →

🧠 New Models 19

Anthropic May 28, 2026

New Models

Claude Opus 4.8

Anthropic ships Claude Opus 4.8 live mid-show

Anthropic released Claude Opus 4.8 during the episode, hitting 69.2% on SWE-bench Pro (up from 64.3% on 4.7 and ahead of GPT-5.5 at 58.6%), a new-best 57.9% on Humanity's Last Exam with tools, and 83.4% on OSWorld-Verified. It also shows a real long-context jump past the usual 200K cliff (85.9% GraphWalks BFS at 256K), with new thinking modes in the UI. Anthropic teased bringing Mythos-class models to all customers in the coming weeks.

69.2% SWE-bench Pro

Claude Opus 4.8 — blog ↗Claude Opus 4.8 — system card ↗

🎙️ Hear our coverage →

#frontier-models #coding #reasoning

Cartesia May 28, 2026

New Models

Ink-2

Cartesia Ink-2 tops Artificial Analysis's new STT leaderboard

Cartesia released Ink-2, which debuted as the most accurate streaming speech-to-text model with the fastest turnaround on Artificial Analysis's new STT leaderboard. It landed just after recording as part of a double post-show voice-AI drop alongside ElevenLabs Dubbing v2.

Cartesia Ink-2 ↗Cartesia announcement ↗Artificial Analysis STT leaderboard ↗

🎙️ Hear our coverage (+1 follow-up) →

ElevenLabs May 28, 2026

New Models

Dubbing v2

ElevenLabs Dubbing v2 preserves your performance across 90+ languages

ElevenLabs launched Dubbing v2, an audio-to-audio dubbing model that translates voices across more than 90 languages while preserving cadence, expression, intonation, and even stutters. Alex's live demos, including dubbing Nisten into Hebrew and his own voice into multiple languages, were the brain-melting moment of the episode.

ElevenLabs Dubbing v2 ↗ElevenLabs announcement ↗ElevenLabs Creative ↗ElevenLabs Productions ↗

🎙️ Hear our coverage (+1 follow-up) →

#voice-ai #multilingual

Microsoft May 28, 2026

New Models

MAI-Image-2.5

Microsoft MAI-Image-2.5 jumps to #3 on Arena text-to-image

MAI-Image-2.5 jumped to number two on Arena's image-to-image leaderboard shortly after launch, with notable strength in image cleanup, backgrounds, documents, and diagrams. Hands-on tests on the show were mixed, and it is publicly accessible through playground.microsoft.ai.

Microsoft MAI Image 2.5 — Arena ↗Microsoft AI announcement ↗MAI-Image-2.5 announcement image ↗X announcement ↗

🎙️ Hear our coverage (+1 follow-up) →

#image-gen #benchmarks

OpenBMB May 28, 2026

New ModelsOpen weights

MiniCPM5-1B

OpenBMB MiniCPM5-1B: new SOTA 1B open-weights model

OpenBMB released MiniCPM5-1B, a state-of-the-art 1B-parameter open-weights model for efficient local and on-device use that runs on a phone. It scores 17.9 on the Artificial Analysis Intelligence Index, 7.4 points ahead of its size class, while using roughly 31x fewer output tokens than Qwen3.5 2B.

17.9 AAII (1B model)

OpenBMB MiniCPM5-1B on Hugging Face ↗MiniCPM5-1B paper ↗Artificial Analysis on MiniCPM5-1B ↗OpenBMB announcement ↗

🎙️ Hear our coverage →

#open-source #on-device

O OpenMOSS May 28, 2026

New ModelsOpen weights

MOSS-TTS-v1.5

MOSS-TTS-v1.5: open-source 8B TTS with 31 languages

OpenMOSS shipped MOSS-TTS-v1.5, an 8B open-source text-to-speech model supporting 31 languages with pause control, released under Apache 2.0. It is one of the larger fully open TTS models available.

MOSS-TTS-v1.5 on Hugging Face ↗MOSS-TTS GitHub ↗MOSS-TTS paper ↗MOSS announcement ↗

🎙️ Hear our coverage →

#voice-ai #open-source

P PrismML May 28, 2026

New ModelsOpen weights

Bonsai Image 4B

PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB

PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.

PrismML Bonsai Image 4B — blog ↗PrismML Bonsai on Hugging Face ↗Bonsai Image demo ↗Bonsai Studio iOS app ↗

🎙️ Hear our coverage →

#image-gen #on-device #infrastructure

Pruna AI May 28, 2026

New Models

P-Image-Upscale

Pruna AI's P-Image-Upscale hits 128 megapixel outputs

Pruna AI released P-Image-Upscale, an image upscaling model that reaches 128 megapixel outputs with fast generation and predictable pricing. It is available through Pruna's API and on Replicate.

Pruna P-Image-Upscale on Replicate ↗P-Image-Upscale docs ↗Pruna announcement ↗

🎙️ Hear our coverage →

Tencent May 28, 2026

New ModelsOpen weights

Hy-MT2

Tencent open-sources Hy-MT2 translation models under Apache 2.0

Tencent released the Hy-MT2 family of translation models under Apache 2.0, including a tiny 1.8B model that beats paid translation APIs like Microsoft's Translator, plus a larger 30B-A3B MoE variant. A small, free, locally-runnable model outperforming commercial translation services was one of the open-source wins of the week.

Tencent Hy-MT2 1.8B ↗Tencent Hy-MT2 30B-A3B ↗Hy-MT2 paper ↗Tencent Hunyuan announcement ↗

🎙️ Hear our coverage →

#open-source #multilingual

Alibaba (Qwen) May 21, 2026

New Models

Qwen 3.7-Max

Alibaba releases Qwen 3.7-Max agentic frontier model with robotics demos

Alibaba released Qwen 3.7-Max, an agentic frontier model built for long autonomous runs, demonstrated alongside robotics demos. It continues the Qwen Max line as Alibaba's closed frontier offering aimed at agentic workloads.

Qwen blog ↗Announcement on X ↗Robot demo ↗

🎙️ Hear our coverage →

#agents #robotics #frontier-models

Cohere May 21, 2026

New ModelsOpen weights

Command A+

Cohere releases Command A+, a 218B Apache 2.0 MoE with 25B active params

Cohere released Command A+, a 218B-parameter mixture-of-experts model with 25B active parameters, shipping open weights under Apache 2.0. It was the week's headline open-source release, available on Hugging Face in both W4A4 quantized and BF16 variants.

218B Command A+ parameters25B active parameters

Cohere blog ↗Nick Frosst ↗HF W4A4 ↗HF BF16 ↗

🎙️ Hear our coverage →

#open-source #architecture

Cursor May 21, 2026

New Models

Composer 2.5

Cursor launches Composer 2.5 with Opus-class coding at much lower cost

Cursor launched Composer 2.5, a coding model continued-trained on top of Kimi K2.5 (with permission) that delivers Opus-class coding performance at much lower cost. The crew noted Cursor is 'absolutely back' with strong pre-training and post-training teams, and that training now runs partly on the Colossus supercomputer.

Cursor blog ↗Cursor on X ↗

🎙️ Hear our coverage →

#coding #agents

Google DeepMind May 21, 2026

New Models

Gemini 3.5 Flash

Gemini 3.5 Flash launches at I/O as Google's agentic workhorse model

Google launched Gemini 3.5 Flash at I/O 2026 as a fast, determined workhorse model built for agentic loops rather than a budget-tier Flash like prior generations. It is rolling out across the Gemini app, Search AI Mode, the Gemini API, Google AI Studio, Antigravity and the Gemini Enterprise Agent Platform. Nisten noted unusual determinism in its behavior, and Logan Kilpatrick framed it as designed for the agentic era.

900M Gemini app users

Logan Kilpatrick announcement ↗Noam Shazeer ↗Jeff Dean ↗Koray Kavukcuoglu on rollout ↗

🎙️ Hear our coverage →

#agents #reasoning #frontier-models

Google DeepMind May 21, 2026

New Models

Gemini Omni

Gemini Omni: 'create anything from anything' conversational video editor

Google DeepMind launched Gemini Omni, a multimodal 'create anything from anything' model debuting as Google's first conversational video editor. Unlike pure text-to-video systems, Omni is an iterative multi-turn editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media, in the same way Nano Banana brought Gemini to interactive image editing. It is available in the Gemini app, Google Flow and YouTube, with API support coming soon.

DeepMind model page ↗Google DeepMind on X ↗Logan on availability ↗Gemini App ↗

🎙️ Hear our coverage (+1 follow-up) →

#video-gen #multimodal #image-gen

F Fastino Labs May 14, 2026

New ModelsOpen weights

GLiGuard

Fastino Labs GLiGuard: 300M open guardrail model matches SOTA safety models

Fastino Labs released GLiGuard, a 300M-parameter open source guardrail model that matches state-of-the-art safety models 23-90x its size while delivering 16x higher throughput. It ships under Apache 2.0, making small, fast, deployable guardrails available to everyone.

300M parameters

X announcement ↗GitHub ↗

🎙️ Hear our coverage →

#open-source #safety

Krea AI May 14, 2026

New Models

Krea 2

Krea 2: Krea's first from-scratch foundation image model

Krea released Krea 2, its first foundation image model trained from scratch, built over six to seven months by nearly half the company. It focuses on aesthetic diversity, style control with up to 4 reference images, and moodboard-driven workflows, generating images in roughly 15 seconds. Co-founder and CEO Victor Perez joined the show to walk through it.

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#image-gen #architecture

Meta AI May 14, 2026

New ModelsOpen weights

Sapiens2

Meta Sapiens2: family of 6 human-centric vision models (0.1B-5B)

Meta released Sapiens2, a family of six ViT models ranging from 0.1B to 5B parameters trained on 1 billion human images. The models set SOTA on human-centric vision tasks including pose estimation, segmentation, surface normals, and pointmaps, with weights on Hugging Face.

X announcement ↗Hugging Face collection ↗

🎙️ Hear our coverage →

#vision #open-source

P Perceptron AI May 14, 2026

New Models

Perceptron Mk1

Perceptron Mk1: frontier video + embodied reasoning at 1/10th the price

Perceptron released Mk1, a frontier video and embodied reasoning model priced at roughly a tenth of comparable models. It scores 88.5 on VSI-Bench and 72.4 on RefSpatialBench (versus 9.0 for GPT-5m on the latter) and is live on OpenRouter.

X announcement ↗Site ↗

🎙️ Hear our coverage →

#video-gen #robotics #vision

Thinking Machines Lab May 14, 2026

New Models

Interaction Models

Thinking Machines Lab drops Interaction Models: real-time multimodal 276B MoE

Mira Murati's Thinking Machines Lab released Interaction Models, a 276B-parameter MoE (12B active) trained from scratch for native real-time multimodal collaboration. It supports full-duplex audio/video/text with 0.40s turn-taking latency and scores 77.8 on FD-bench v1.5. The demo can react live to events like another person entering the camera frame.

276B MoE parameters12B active parameters

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#multimodal #voice-ai

🚀 Products & Apps 6

Google May 28, 2026

Products & Apps

Universal Cart / AP2 / UCP

Google launches Universal Cart, AP2 and UCP for agentic commerce

Google launched Universal Cart along with the AP2 and UCP protocols, infrastructure that lets AI agents shop and pay on a user's behalf. It is Google's play to standardize agent-driven commerce across merchants and payment flows.

Google Universal Cart / AP2 / UCP ↗

🎙️ Hear our coverage →

#agents #industry

Runway May 28, 2026

Products & Apps

Project Luxo

Runway launches Project Luxo for solo-creator short films

Runway launched Project Luxo, claiming AI-generated video has crossed the uncanny valley for solo-creator short films. The pitch is that a single creator can now produce watchable short-form films end to end with Runway's stack.

Runway Project Luxo — blog ↗Runway announcement ↗

🎙️ Hear our coverage →

#video-gen #image-gen

Google May 21, 2026

Products & Apps

Antigravity 2.0

Antigravity 2.0 becomes Google's central agentic coding harness

Antigravity 2.0 was positioned at I/O 2026 as the single agent harness powering agentic experiences across Google, from internal tooling to Search, Workspace and developer products. Born from the Windsurf acquisition, it evolved from an agent-first IDE into the through line for Google's agentic strategy, now exposed to external developers as well.

Sundar Pichai announcement ↗Google OS demo ↗

🎙️ Hear our coverage →

#coding #agents

Google May 21, 2026

Products & Apps

Gemini Spark

Gemini Spark announced as a 24/7 proactive personal AI agent

Google announced Gemini Spark, a 24/7 personal AI agent that can proactively work across Google surfaces, framed on the show as Google's OpenClaw competitor. Access was not yet broadly available at announcement time, so the crew discussed it from the announcement rather than hands-on testing.

News from Google ↗

🎙️ Hear our coverage →

#agents #consumer-ai

CoreWeave May 14, 2026

Products & Apps

CoreWeave Sandboxes

CoreWeave Sandboxes launch in preview via the W&B SDK

CoreWeave Sandboxes is now an official Harbor provider, letting teams run agentic workloads like Terminal-Bench safely at scale on CoreWeave infrastructure. It plugs CoreWeave's isolated execution environments directly into the Harbor eval/agent ecosystem.

Docs ↗CoreWeave blog ↗CoreWeave Sandboxes ↗

🎙️ Hear our coverage (+1 follow-up) →

#agents #infrastructure #benchmarks

OpenAI May 14, 2026

Products & Apps

Daybreak

OpenAI launches Daybreak, a frontier AI cybersecurity platform

OpenAI announced Daybreak, a frontier AI cybersecurity platform that pairs GPT-5.5 with Codex for security workloads. It launches with partners including Cloudflare, positioning OpenAI directly in the AI-powered defense market.

X announcement ↗

🎙️ Hear our coverage →

#safety #agents

✨ Major Features & Updates 8

Anthropic May 28, 2026

Major Features & Updates

Dynamic Workflows in Claude Code

Dynamic Workflows and Ultra Code land in Claude Code

Alongside Opus 4.8, Anthropic shipped Dynamic Workflows and an Ultra Code mode in Claude Code, which Yam fired up live on the show. The headline proof point: Bun was ported from Zig to Rust — about 750K lines — via Dynamic Workflows, with 99.8% of the test suite passing and the port merged in 11 days.

750K lines Bun: Zig → Rust

Dynamic Workflows in Claude Code ↗

🎙️ Hear our coverage →

#coding #agents

Google May 28, 2026

Major Features & Updates

AI Studio native Android apps

Google AI Studio builds free native Android apps; 250K in week one

Google AI Studio now lets anyone build native Android apps for free, with 250,000 apps created in the first week. The crew framed it as another step toward personalized, disposable software that anyone can vibe-code on demand.

Google AI Studio ↗Logan Kilpatrick announcement ↗

🎙️ Hear our coverage →

#coding #consumer-ai

Anthropic May 21, 2026

Major Features & Updates

Claude off-peak usage boost

Anthropic doubles Claude usage limits outside peak hours for a limited time

Anthropic doubled Claude usage outside peak hours for a limited period, covering Claude Code and other Claude surfaces. The move gives heavy users substantially more agentic and coding throughput during off-peak windows.

Claude on X ↗

🎙️ Hear our coverage →

Google May 21, 2026

Major Features & Updates

Google Search agentic capabilities

Google Search adds Gemini 3.5 Flash-powered agentic capabilities

Google Search is getting new Gemini 3.5 Flash-powered agentic capabilities, including a new AI-powered Search box and background information agents. The crew framed the rollout as a massive intelligence uplift across one of Google's largest surfaces, with billions of Search users getting frontier-model capabilities.

3.5B Google Search users

Sundar Pichai on Search agents ↗Alex's I/O thread ↗

🎙️ Hear our coverage →

#agents #search

OpenAI May 21, 2026

Major Features & Updates

Codex Mobile

OpenAI Codex Mobile arrives in the ChatGPT mobile apps

OpenAI's Codex Mobile is now available in the ChatGPT mobile apps, enabling remote agent workflows from a phone. The crew discussed it as part of the broader shift toward driving coding agents from anywhere rather than just the desktop.

OpenAI on X ↗

🎙️ Hear our coverage →

#coding #agents #consumer-ai

Anthropic May 14, 2026

Major Features & Updates

Claude Agent SDK monthly credits

Anthropic adds separate Claude Agent SDK credits to paid plans

Anthropic announced separate monthly Claude Agent SDK credits for Pro, Max, Team, and Enterprise subscribers, starting June 15, 2026. This gives agent builders a dedicated usage pool on top of regular plan limits.

🎙️ Hear our coverage →

#agents #coding

Meta AI May 14, 2026

Major Features & Updates

Muse Spark voice conversations

Meta launches Muse Spark voice conversations across its apps and glasses

Meta rolled out Muse Spark-powered voice conversations across the Meta AI app, WhatsApp, Instagram, Facebook, and Ray-Ban Meta glasses. The feature includes real-time image generation, live camera AI, and instant Reels/maps integration. Alex tested it live and called it surprisingly good, the first big consumer ship from Meta Superintelligence Labs.

X announcement ↗Announcement ↗

🎙️ Hear our coverage →

#voice-ai #consumer-ai #multimodal

OpenAI (Codex), Anthropic, Nous Research May 14, 2026

Major Features & Updates

/goal command

/goal command lands in Codex, Claude Code, and Hermes - the productized Ralph

The /goal command is now available in Codex, Claude Code, and Hermes, productizing the Ralph loop pattern: set a measurable success condition and the agent iterates autonomously until it is done. Codex's implementation is winning early head-to-head comparisons over Claude Code, and the show framed it as turning coding agents into 24/7 AI employees.

X thread ↗Codex docs: follow goals ↗

🎙️ Hear our coverage →

#agents #coding

🔌 APIs & Platforms 1

Google DeepMind May 21, 2026

APIs & Platforms

Managed Agents (Gemini API)

Gemini API gets Managed Agents with hosted sandboxes and the Interactions API

Google launched Managed Agents in the Gemini API, letting developers spin up hosted Antigravity agents with Linux sandboxes and persistent state. It ships alongside the next-generation Interactions API, which Logan Kilpatrick described as designed for agentic systems rather than the old tokens-in, tokens-out model interaction pattern.

Gemini API agents docs ↗Google AI Developers on X ↗

🎙️ Hear our coverage →

#agents #api #coding

🛠️ Dev Tools 4

C Cua May 28, 2026

Dev ToolsOpen weights

Cua Driver for Windows

Cua Driver brings background computer-use agents to Windows

Cua launched Windows support for Cua Driver, enabling background computer-use agents that operate real desktop apps without taking over the user's screen. It extends Cua's open-source computer-use stack to the largest desktop OS.

Cua Driver Windows — blog ↗Cua GitHub ↗Cua announcement ↗

🎙️ Hear our coverage →

#agents #consumer-ai

Weights & Biases May 28, 2026

Dev Tools

W&B MCP Server

Weights & Biases launches MCP server with 20 tools for agents

W&B officially launched its MCP server with 20 schema-first tools so coding agents can read experiments, monitor training, and run autonomous research loops. Agents can query metadata before pulling full 300-metric runs, keeping their context windows from blowing up.

W&B MCP Server ↗W&B MCP Server — blog ↗W&B announcement ↗

🎙️ Hear our coverage →

#agents #coding #infrastructure

xAI May 21, 2026

Dev Tools

Grok Build

xAI launches Grok Build, an agentic CLI coding tool in beta

xAI launched Grok Build, an agentic CLI coding tool, in beta for SuperGrok Heavy subscribers. It joins the crowded field of terminal-based coding agents as xAI's entry into agentic engineering tooling.

xAI CLI page ↗xAI on X ↗

🎙️ Hear our coverage →

#coding #agents

Nous Research May 14, 2026

Dev Tools

Hermes CLI agent

Hermes passes OpenClaw as #1 CLI agent on OpenRouter, adds computer use

Nous Research's Hermes overtook OpenClaw as the #1 CLI agent on OpenRouter. It also added background computer use via Trykua, and Alex described switching his own daily agent workflow from OpenClaw to Hermes.

X announcement ↗

🎙️ Hear our coverage →

#agents #coding

📄 Papers & Research 3

Nous Research May 21, 2026

Papers & ResearchOpen weights

Lighthouse Attention

Nous Research publishes Lighthouse Attention for fast long-context pretraining

Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining that delivers major speedups. The release includes a blog post, an arXiv paper and an open-source GitHub implementation.

Blog ↗Nous Research on X ↗arXiv ↗GitHub ↗

🎙️ Hear our coverage →

#research #architecture #open-source

OpenAI May 21, 2026

Papers & Research

Erdős planar unit distance result

OpenAI model makes progress on 80-year-old Erdős planar unit distance problem

OpenAI announced that a general-purpose reasoning model made progress on the Erdős planar unit distance problem, challenging an 80-year-old mathematical belief. The panel called it the most important news of the week outside Google I/O, as a sign that frontier reasoning models are starting to contribute to genuinely open mathematics.

80-year Erdos math problem

OpenAI blog post ↗OpenAI on X ↗

🎙️ Hear our coverage →

#reasoning #research

Nous Research May 14, 2026

Papers & ResearchOpen weights

TST (Token Superposition Training)

Nous Research TST: 2-3x training speedup without architecture changes

Nous Research released Token Superposition Training (TST), a training technique that achieves 2-3x wall-clock speedup at matched FLOPs. It requires no architecture changes, making it a drop-in efficiency win for LLM training runs.

X announcement ↗

🎙️ Hear our coverage →

#research #training

📊 Benchmarks & Evals 2

D Datacurve May 28, 2026

Benchmarks & EvalsOpen weights

DeepSWE

Datacurve's DeepSWE: a contamination-free coding benchmark

DeepSWE is a coding leaderboard built from 113 original tasks written from scratch and shipped as shallow clones with no git history to cheat from. GPT-5.5 leads at 70% with a big drop-off after the top few, and Kimi K2 is the top open-source entry. Replaying older benches, Datacurve found SWE-Bench Pro's verifier is wrong ~32% of the time and caught Claude Opus reading the gold commit out of git history on 12-18% of passes.

70% DeepSWE leader (GPT-5.5)

DeepSWE benchmark ↗DeepSWE blog ↗DeepSWE GitHub ↗

🎙️ Hear our coverage →

#benchmarks #coding

Artificial Analysis May 14, 2026

Benchmarks & Evals

Coding Agent Index

Artificial Analysis Coding Agent Index benchmarks model + harness combos

Artificial Analysis launched the Coding Agent Index, a benchmark that evaluates model and harness combinations rather than models alone. Opus 4.7 in Cursor CLI leads at 61, GLM-5.1 tops the open-weight entries at 53, and costs vary 30x across combos for similar capability.

X announcement ↗

🎙️ Hear our coverage →

#benchmarks #coding #agents

🌀 Also Released 1

Anthropic May 21, 2026

Also Released

Colossus compute deal

SpaceX IPO filing reveals Anthropic pays $1.25B/month for Colossus compute

The SpaceX IPO filing revealed Anthropic is paying $1.25 billion per month for AI compute at the Memphis Colossus facility. The crew called it a bombastic deal that lets Anthropic serve far more inference at scale and feel less compute-constrained.

$1.25B monthly AI compute spend

Axios ↗Sawyer Merritt ↗

🎙️ Hear our coverage →

#infrastructure

← April 2026 All months June 2026 →