Cartesia Sonic 3: real-time TTS with emotion and laughter, plus $100M raise
Cartesia launched Sonic 3, a real-time text-to-speech model that adds expressive emotion and natural laughter, announced alongside a $100M funding round. Co-founder Arjun Desai joined the show to break down the voice stack and why state-space-model approaches enable this latency and expressiveness.
$100M funding round announced alongside the launch
Cognition SWE-1.5: 950 tok/s coding model hitting 40% on SWE-bench Pro
Cognition released SWE-1.5, a fast agentic coding model that serves around 950 tokens per second and scores about 40% on SWE-bench Pro. It ships inside Windsurf and reinforces the week's theme of speed-focused coding models from agent labs.
IBM Granite 4.0 Nano: ultra-efficient tiny models for edge deployment
IBM released Granite 4.0 Nano, a set of ultra-efficient tiny open models aimed at edge deployment. The release continues the trend of capable sub-billion-to-few-billion parameter models that can run locally on constrained hardware.
Ming-flash-omni Preview: sparse MoE omni-modal open model
Ant Group's InclusionAI team released Ming-flash-omni Preview, a sparse mixture-of-experts omni-modal model on Hugging Face. It handles multiple input and output modalities in a single open-weights model, adding to the wave of Chinese open omni-modal releases.
Hailuo 2.3: MiniMax's cinema-grade video generation model
MiniMax's Hailuo team released version 2.3 of its video generation model, pitching cinema-grade output quality. It landed in the same week as MiniMax M2 and Speech 2.6, underlining how broadly MiniMax is shipping across text, voice, and video.
MiniMax M2: open-source agentic model at 8% of Claude's price, 2x speed
MiniMax released M2, an open-source agentic model positioned at roughly 8% of Claude's price while running about twice as fast. Head of Engineering Skyler Miao joined the show for a deep dive, framing M2 as both a model story and a speed story, and the panel read it as part of a broader open-model pressure wave on frontier labs.
8% of Claude's price2x speed vs comparable frontier models
MiniMax Speech 2.6: ultra-human voice AI with sub-250ms latency
MiniMax released Speech 2.6, a voice model targeting ultra-human quality with end-to-end latency under 250ms, available through the MiniMax platform API. It slots into the episode's voice arms race alongside Cartesia's Sonic 3.
Kimi Linear: 48B open model with linear attention and 1M context
Moonshot AI released Kimi Linear, a 48B parameter (A3B active) instruct model that uses linear attention to reach a 1M token context window. It is an open-weights bet on efficient long-context architectures from the Kimi team.
OpenAI ships GPT-OSS-Safeguard, first open-weight safety reasoning models
OpenAI released GPT-OSS-Safeguard, its first open-weight safety reasoning models, built on the GPT-OSS family. The models let developers apply custom safety policies via reasoning rather than fixed classifiers, extending OpenAI's open-weights push into the trust-and-safety layer.
Qwen3-VL adds compact 2B and 32B multimodal models
Alibaba's Qwen team extended the Qwen3-VL family with newly updated 2B and 32B checkpoints. The 2B is a generic VLM (OCR-capable) that holds up against its 4B and 8B siblings from prior weeks, while the 32B reportedly outperforms GPT-5 mini and Claude 4 Sonnet on benchmarks.
The Allen Institute for AI updated its open OCR line with olmOCR 2 at 7B (released as an FP8 checkpoint), landing in the same week as DeepSeek-OCR, Qwen3-VL, and Liquid's LFM2-VL. Another sign that document understanding became this week's hottest open-model category.
DeepSeek-OCR turns text into compressed vision tokens for massive contexts
DeepSeek open-sourced DeepSeek-OCR, a 3B model (~570M active parameters) that is less an OCR model and more a context-compression breakthrough: it renders text as images, compresses it up to 10x while retaining 97% decoding accuracy (60% even at 20x), and reads it back with a tiny vision decoder. The approach suggests text tokenization is far from optimal and points at vastly cheaper long-context processing; alphaXiv reportedly OCR'd all of arXiv for $1000 versus $7500 with MistralOCR, and a single H100 can process up to 200K pages.
97% decoding accuracy at 10x compression~570M active parameters (3B total)200K pages scannable on a single H100
Krea open-sources a 14B real-time video generation model
Krea AI open-sourced a 14-billion-parameter real-time video model, with weights on Hugging Face. It joins the week's clear trend of generative video racing toward live, interactive experiences rather than offline rendering.
LTX-2: native 4K audio+video generation engine from Lightricks
Lightricks announced LTX-2 as breaking news on the show: a video generation engine producing native 4K video (no upscaling) with synchronized audio, positioned as a fast, efficient open alternative to closed models like Sora. It is billed as open-source with weights coming this fall.
Liquid AI ships LFM2-VL-3B tiny multilingual vision-language model
Liquid AI released LFM2-VL-3B, a tiny multilingual vision-language model, part of a wave of OCR-and-VLM releases this week. It targets efficient on-device and edge vision-language workloads at the 3B scale.
PokeeResearch-7B: open-source SOTA deep research agent model
Pokee AI released PokeeResearch-7B, an open-source 7B deep research agent model claiming state-of-the-art results for its size. Weights, code, a paper, and a hosted deep-research preview all shipped together.
Qwen3-VL adds compact 3B and 8B open vision-language models
Alibaba's Qwen team released smaller Qwen3-VL vision-language models in 3B and 8B sizes, bringing the flagship VL capabilities down to edge- and laptop-friendly scales. Weights are open on Hugging Face as part of the Qwen3-VL collection.
Claude Haiku 4.5: fast, cheap model rivals Sonnet 4 accuracy
Anthropic released Claude Haiku 4.5, its smallest and fastest current-generation model. The show highlighted that it approaches Sonnet 4 level accuracy at a fraction of the cost and latency, making it attractive for high-volume agentic and production workloads.
Baidu's MuseStreamer pushes video generations past 20 seconds
Baidu showed off MuseStreamer, a video generation model producing clips longer than 20 seconds. It adds another Chinese lab to the long-form video generation race alongside Veo and Sora.
Cognition SWE-grep: RL-trained fast context retrieval for coding agents
Cognition released SWE-grep, an RL-trained multi-turn context retriever that finds relevant code for agentic coding tasks far faster than full agent loops. It powers fast context retrieval in Cognition's products, and a public playground lets developers try it on real repos.
Google's C2S-Scale 27B validates a cancer hypothesis in living cells
Google released C2S-Scale 27B, a Gemma-based single-cell biology model that generated a novel cancer therapy hypothesis later validated in living cells. The show called this a bombshell example of AI contributing to real scientific discovery rather than just benchmarks.
Veo 3.1: Google's next-gen video model launches with cinematic audio
Google DeepMind shipped Veo 3.1, the next version of its video generation model with improved quality and cinematic audio. Senior PM Jessica Gallegos joined the show to discuss how the model and its product packaging (including Flow) are evolving video generation into a real user experience story.
KAIST releases KORMo, a bilingual Korean/English 10B open model
KAIST published KORMo, a 10B parameter fully open bilingual model for Korean and English, with weights on Hugging Face and an accompanying paper. It continues the trend of strong national-language open models coming out of Korean labs.
OpenPipe Qwen3 14B Instruct lands on W&B Inference
OpenPipe, now part of Weights & Biases / CoreWeave, released a Qwen3 14B instruct model available through W&B Inference. Co-founder Kyle Corbitt joined the show to talk RL, Serverless RL, and practical agent evaluation and deployment.
Sourceful's Riverflow 1 image-editing model took the top spot on the image-editing leaderboard. It is a notable result from a smaller lab in a category dominated by big-name image models.
World Labs RTFM renders 3D worlds in real time on a single H100
World Labs released RTFM (Real-Time Frame Model), a generative world model that renders explorable, persistent 3D worlds at interactive frame rates on a single H100 GPU. A live demo lets anyone walk through generated worlds in the browser.
1X opens orders for NEO, a $20k consumer home humanoid shipping in 2026
1X Technologies opened orders for NEO, billed as the first consumer humanoid robot for the home, priced at $20,000 with deliveries starting in 2026. The panel treated it as a signal that home humanoid timelines are no longer purely sci-fi, anchoring the episode's 2026 robot revolution theme.
Cursor 2.0 ships with Composer, its own 4x-faster coding model
Cursor released version 2.0 of its AI code editor alongside Composer, a new in-house coding model claimed to be about 4x faster. The launch came up as evidence that developer products are being rebuilt agent-first, with speed and orchestration as the new battleground.
Google Labs launches Pomelli, an AI marketing agent
Google Labs released Pomelli, an experimental AI marketing agent that generates on-brand campaigns and marketing assets for businesses. It was covered in the tools section as another sign of agents moving into specific professional workflows.
Odyssey V2: real-time interactive AI video you can steer as it generates
Odyssey ML launched V2 of its real-time interactive AI video experience, where the video stream is generated live and responds to user input. The panel grouped it with the week's evidence that video is becoming an interactive product surface rather than a render-and-wait demo.
Perplexity launches privacy-first Email Assistant for inbox management
Perplexity launched an Email Assistant that manages your inbox with a privacy-first pitch, drafting replies and triaging mail. It extends Perplexity's push from search into day-to-day agentic productivity surfaces.
Claude Code comes to the web with sandboxed cloud coding
Anthropic brought Claude Code to the web, letting developers delegate software tasks through a browser with GitHub integration, secure sandboxed execution, multi-repo support, and automatic pull requests, making it usable even from a phone. The Claude desktop app was also upgraded with screen context via screenshots, file sharing, and a new voice mode.
Browserbase launches Director 2.0 with 1Password delegated auth
Browserbase launched Director 2.0, a prompt-powered web automation platform that performs a task from natural language and hands back a repeatable, deployable script. Its standout innovation is delegated, per-site authentication via a 1Password integration: cloud agents request login approval on your local machine site-by-site instead of getting master-key access to all sessions, a much safer model than Atlas-style all-or-nothing access.
OpenAI launches ChatGPT Atlas, its agentic AI browser
OpenAI shipped Atlas, a Chromium-based browser deeply integrated with ChatGPT: natural-language history search, a 'Cursor' inline text-rewrite tool, browsing-pattern memories, and an Ask ChatGPT sidepane. Its agent mode runs with your logged-in sessions and cookies, enabling long multi-step tasks (Alex had it complete a 5-hour compliance training) but raising prompt-injection security concerns that OpenAI's CISO addressed publicly. macOS only at launch, for Pro, Plus, and Go tiers.
Apple announces M5 chip with double the AI performance
Apple unveiled the M5 chip, claiming roughly double the AI performance of the previous generation for Apple Silicon. For local-model enthusiasts on the show, it means more on-device headroom for running and fine-tuning models on Macs.
NVIDIA DGX Spark: a desktop personal supercomputer for local AI
NVIDIA started shipping DGX Spark, a desktop personal AI supercomputer aimed at prototyping and local inference. The show pointed to the LMSYS deep dive on its real-world performance, and Alex shared his own first impressions of the device.
Sora drops invite requirement and adds Character Cameos
OpenAI removed the invite requirement for the Sora app and shipped Character Cameos, letting users create reusable characters that can appear across generated videos. The update widens access to Sora as OpenAI pushes it as a consumer video product.
Google AI Studio launches 'Vibe Coding' build experience
Google's Gemini AI Studio launched a 'Vibe Coding' experience at ai.studio/build, letting users build apps from natural-language prompts with Gemini. It puts Google into the rapidly crowding prompt-to-app space alongside the week's other coding-agent moves.
Microsoft adds agentic powers and voice to Copilot Mode in Edge
Microsoft answered Atlas with agentic enhancements to Copilot Mode in Edge, including a voice mode that can see and discuss the current page, plus broader Copilot updates (and Clippy back as an easter egg via the Mico avatar). In Alex's hands-on testing the agentic features did not actually work, so real-world parity with Atlas and Comet is unproven.
Reve quietly surfaces an unannounced 1080p video mode with sound
Reve's unannounced video mode was spotted this week, generating 1080p video with sound. It was covered briefly in the show's vision and video roundup with no official announcement or links yet.
Amp launches a free tier powered by ads and surplus model capacity
Amp (from the Sourcegraph team) launched a free tier for its coding agent, funded by ads and surplus model capacity. CEO Quinn Slack joined the show to explain the economics and the product thinking behind ad-supported AI dev tooling.
Claude Skills: custom instructions for AI agents now live
Anthropic launched Claude Skills, folders of instructions and resources that Claude loads on demand to specialize agents for specific tasks. The panel treated it as a major piece of the emerging builder stack, with Simon Willison arguing Skills could be a bigger deal than MCP.
Microsoft makes every Windows 11 PC an AI PC with Copilot voice input
Microsoft announced that every Windows 11 machine becomes an 'AI PC,' adding 'Hey Copilot' voice input and deeper agentic Copilot integration at the OS level. The panel discussed it as a sign of AI assistants moving into the default computing experience.
OpenAI ships smarter ChatGPT memory management, no more 'memory full'
OpenAI updated ChatGPT's memory system so it automatically manages and prioritizes saved memories, eliminating the 'memory full' dead end. The change makes long-running personalized use of ChatGPT smoother without manual memory pruning.
Sora extends generations to 15s (25s Pro) and adds storyboards
OpenAI upgraded Sora with longer generations, up to 15 seconds for standard users and 25 seconds for Pro, plus a new storyboard feature for multi-shot control. The update keeps Sora competitive as video models race on length and controllability.
Decart ships real-time lip-sync API for live AI avatars
Decart AI released a real-time lip-sync API that modifies an avatar's video frames to match generated speech on the fly. Kwindla Kramer broke down the pipeline on the show: WebRTC audio capture, Whisper transcription, an LLM response, ElevenLabs voice generation, then Decart's model syncing the avatar's lips, all at sub-two-second latency, a key step toward interactive, believable AI characters.
Pokee AI launched Pokee, an agentic workflow builder for chaining AI actions into automated workflows. It was covered in the tools rundown as part of the expanding agent-first builder stack.
TorchForge: PyTorch-native library for scalable RL post-training
Meta's PyTorch team, in collaboration with Weights & Biases/CoreWeave and Stanford, introduced TorchForge, a PyTorch-native library for scalable reinforcement-learning post-training and agent development. Built for massive GPU runs (W&B/CoreWeave provided 520 H100s) and competing with Ray via tools like the Monarch scheduler.
DiT360: SOTA panoramic image generation with hybrid training
DiT360 is a diffusion-transformer approach to panoramic image generation that uses hybrid training across perspective and panoramic data to reach state-of-the-art quality. The project page and GitHub release make the work reproducible.
CoreWeave acquires Marimo, the reactive Python notebook company
CoreWeave, the parent company of Weights & Biases, acquired Marimo, makers of the open-source reactive Python notebook. Covered in the This Week's Buzz segment, the deal brings a popular developer notebook tool into CoreWeave's AI cloud stack.
OpenAI and Broadcom to deploy 10 gigawatts of custom AI accelerators
OpenAI announced a strategic collaboration with Broadcom to co-develop and deploy 10 gigawatts of custom AI accelerators. It is another massive compute commitment in OpenAI's infrastructure buildout, this time with chips designed in-house.