New Models
Claude Opus 4.8
Anthropic ships Claude Opus 4.8 live mid-show
Anthropic released Claude Opus 4.8 during the episode, hitting 69.2% on SWE-bench Pro (up from 64.3% on 4.7 and ahead of GPT-5.5 at 58.6%), a new-best 57.9% on Humanity's Last Exam with tools, and 83.4% on OSWorld-Verified. It also shows a real long-context jump past the usual 200K cliff (85.9% GraphWalks BFS at 256K), with new thinking modes in the UI. Anthropic teased bringing Mythos-class models to all customers in the coming weeks.
69.2% SWE-bench Pro
New Models
Ink-2
Cartesia Ink-2 tops Artificial Analysis's new STT leaderboard
Cartesia released Ink-2, which debuted as the most accurate streaming speech-to-text model with the fastest turnaround on Artificial Analysis's new STT leaderboard. It landed just after recording as part of a double post-show voice-AI drop alongside ElevenLabs Dubbing v2.
New Models
Dubbing v2
ElevenLabs Dubbing v2 preserves your performance across 90+ languages
ElevenLabs launched Dubbing v2, an audio-to-audio dubbing model that translates voices across more than 90 languages while preserving cadence, expression, intonation, and even stutters. Alex's live demos, including dubbing Nisten into Hebrew and his own voice into multiple languages, were the brain-melting moment of the episode.
New Models
MAI-Image-2.5
Microsoft MAI-Image-2.5 jumps to #3 on Arena text-to-image
MAI-Image-2.5 jumped to number two on Arena's image-to-image leaderboard shortly after launch, with notable strength in image cleanup, backgrounds, documents, and diagrams. Hands-on tests on the show were mixed, and it is publicly accessible through playground.microsoft.ai.
New ModelsOpen weights
MiniCPM5-1B
OpenBMB MiniCPM5-1B: new SOTA 1B open-weights model
OpenBMB released MiniCPM5-1B, a state-of-the-art 1B-parameter open-weights model for efficient local and on-device use that runs on a phone. It scores 17.9 on the Artificial Analysis Intelligence Index, 7.4 points ahead of its size class, while using roughly 31x fewer output tokens than Qwen3.5 2B.
17.9 AAII (1B model)
New ModelsOpen weights
MOSS-TTS-v1.5
MOSS-TTS-v1.5: open-source 8B TTS with 31 languages
OpenMOSS shipped MOSS-TTS-v1.5, an 8B open-source text-to-speech model supporting 31 languages with pause control, released under Apache 2.0. It is one of the larger fully open TTS models available.
New ModelsOpen weights
Bonsai Image 4B
PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB
PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.
New Models
P-Image-Upscale
Pruna AI's P-Image-Upscale hits 128 megapixel outputs
Pruna AI released P-Image-Upscale, an image upscaling model that reaches 128 megapixel outputs with fast generation and predictable pricing. It is available through Pruna's API and on Replicate.
New ModelsOpen weights
Hy-MT2
Tencent open-sources Hy-MT2 translation models under Apache 2.0
Tencent released the Hy-MT2 family of translation models under Apache 2.0, including a tiny 1.8B model that beats paid translation APIs like Microsoft's Translator, plus a larger 30B-A3B MoE variant. A small, free, locally-runnable model outperforming commercial translation services was one of the open-source wins of the week.
New Models
Qwen 3.7-Max
Alibaba releases Qwen 3.7-Max agentic frontier model with robotics demos
Alibaba released Qwen 3.7-Max, an agentic frontier model built for long autonomous runs, demonstrated alongside robotics demos. It continues the Qwen Max line as Alibaba's closed frontier offering aimed at agentic workloads.
New ModelsOpen weights
Command A+
Cohere releases Command A+, a 218B Apache 2.0 MoE with 25B active params
Cohere released Command A+, a 218B-parameter mixture-of-experts model with 25B active parameters, shipping open weights under Apache 2.0. It was the week's headline open-source release, available on Hugging Face in both W4A4 quantized and BF16 variants.
218B Command A+ parameters25B active parameters
New Models
Composer 2.5
Cursor launches Composer 2.5 with Opus-class coding at much lower cost
Cursor launched Composer 2.5, a coding model continued-trained on top of Kimi K2.5 (with permission) that delivers Opus-class coding performance at much lower cost. The crew noted Cursor is 'absolutely back' with strong pre-training and post-training teams, and that training now runs partly on the Colossus supercomputer.
New Models
Gemini 3.5 Flash
Gemini 3.5 Flash launches at I/O as Google's agentic workhorse model
Google launched Gemini 3.5 Flash at I/O 2026 as a fast, determined workhorse model built for agentic loops rather than a budget-tier Flash like prior generations. It is rolling out across the Gemini app, Search AI Mode, the Gemini API, Google AI Studio, Antigravity and the Gemini Enterprise Agent Platform. Nisten noted unusual determinism in its behavior, and Logan Kilpatrick framed it as designed for the agentic era.
900M Gemini app users
New Models
Gemini Omni
Gemini Omni: 'create anything from anything' conversational video editor
Google DeepMind launched Gemini Omni, a multimodal 'create anything from anything' model debuting as Google's first conversational video editor. Unlike pure text-to-video systems, Omni is an iterative multi-turn editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media, in the same way Nano Banana brought Gemini to interactive image editing. It is available in the Gemini app, Google Flow and YouTube, with API support coming soon.
New ModelsOpen weights
GLiGuard
Fastino Labs GLiGuard: 300M open guardrail model matches SOTA safety models
Fastino Labs released GLiGuard, a 300M-parameter open source guardrail model that matches state-of-the-art safety models 23-90x its size while delivering 16x higher throughput. It ships under Apache 2.0, making small, fast, deployable guardrails available to everyone.
300M parameters
New Models
Krea 2
Krea 2: Krea's first from-scratch foundation image model
Krea released Krea 2, its first foundation image model trained from scratch, built over six to seven months by nearly half the company. It focuses on aesthetic diversity, style control with up to 4 reference images, and moodboard-driven workflows, generating images in roughly 15 seconds. Co-founder and CEO Victor Perez joined the show to walk through it.
New ModelsOpen weights
Sapiens2
Meta Sapiens2: family of 6 human-centric vision models (0.1B-5B)
Meta released Sapiens2, a family of six ViT models ranging from 0.1B to 5B parameters trained on 1 billion human images. The models set SOTA on human-centric vision tasks including pose estimation, segmentation, surface normals, and pointmaps, with weights on Hugging Face.
New Models
Perceptron Mk1
Perceptron Mk1: frontier video + embodied reasoning at 1/10th the price
Perceptron released Mk1, a frontier video and embodied reasoning model priced at roughly a tenth of comparable models. It scores 88.5 on VSI-Bench and 72.4 on RefSpatialBench (versus 9.0 for GPT-5m on the latter) and is live on OpenRouter.
New Models
Interaction Models
Thinking Machines Lab drops Interaction Models: real-time multimodal 276B MoE
Mira Murati's Thinking Machines Lab released Interaction Models, a 276B-parameter MoE (12B active) trained from scratch for native real-time multimodal collaboration. It supports full-duplex audio/video/text with 0.40s turn-taking latency and scores 77.8 on FD-bench v1.5. The demo can react live to events like another person entering the camera frame.
276B MoE parameters12B active parameters