Tongyi's Z-Image Turbo brings sub-second open image generation
Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.
Anthropic launches Claude Opus 4.5, reclaiming the coding crown
Anthropic released Claude Opus 4.5, scoring 80.9% on SWE-bench Verified to top GPT-5.1 (77.9%) and Gemini 3 Pro (76.2%). It adds a new 'Effort' parameter for compute control, Tool Search to cut agent token overhead, and Programmatic Tool Calling where the model writes and executes code loops. Pricing dropped to $5/M input and $25/M output, roughly one-third the old Opus price.
Black Forest Labs releases FLUX.2, a 32B multi-reference image model
Black Forest Labs released FLUX.2, a 32B-parameter image model with open weights (FLUX.2-dev) that supports multi-reference image editing. It lets users combine multiple reference images and prompt edits with variables, a step up in controllable image editing.
DeepSeek Math V2: 685B open-weights model with IMO gold-level math
DeepSeek surfaced DeepSeek Math V2, a 685B-parameter Apache-2.0 model that reaches IMO gold-level math reasoning. It is the first open-weights math champion at this level, dropped quietly on HuggingFace during the week.
Microsoft ships Fara-7B, a 7B on-device computer use agent
Microsoft Research released Fara-7B, a best-in-class 7B-parameter vision-language model for computer use that runs on-device. It scores 73.5% on WebVoyager, beating OpenAI's computer-use preview while being small enough to run locally.
Prime Intellect releases INTELLECT-3, a 106B open MoE model
Prime Intellect released INTELLECT-3, a 106B-parameter mixture-of-experts model with 12B active parameters that scores 90% on AIME 2024/2025. The lab fully open-sourced the training stack alongside the weights, showing a small lab can train frontier-scale models.
106B Total parameters (12B active)90% AIME 2024/2025
Tencent's 1B HunyuanOCR beats 72B models on OCRBench
Tencent released HunyuanOCR, a 1B-parameter OCR model that scores 860 on OCRBench, beating models as large as Qwen3-VL-72B. It is a striking example of task-specialized small models outperforming generalist giants.
Tencent releases HunyuanVideo 1.5, a lightweight open video model
Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.
OLMo 3: Allen AI's fully open 32B model with complete recipe
Allen AI released OLMo 3, a fully open 32B dense model where the dataset, training recipe, and hyperparameters are all public — not just the weights. LDJ contrasted it with open-weights-only releases from Qwen and DeepSeek, which have never published a fully open recipe.
32B Dense parameters, fully open dataset and recipe
Gemini 3 Pro launches with record ARC-AGI-2 scores
Google's new frontier multimodal model with a 1M-token context window and huge reasoning gains, scoring 31.11% on ARC-AGI-2 (45.14% with Deep Think mode) — roughly double the previous SOTA — plus 81% on MMLU-Pro and major coding improvements. Amp switched to it as their default model on launch day, the first time they have ever switched defaults. Also rolling out across Gmail, Calendar, and AI Mode in Google Search.
Nano Banana Pro generates 4K images with perfect text
Google's upgraded image model dropped as breaking news mid-show, adding visible thinking traces, 4K resolution output, and SynthID watermarking with C2PA metadata. Alex demoed it live by one-shotting an 8MB AI-news infographic with flawless text and pixel-accurate logos across the entire image. It also powers generative UIs in Gemini, building interactive dashboards with real data on the fly.
4K First image model with flawless 4K output and perfect text
Meta SAM 3: open-vocabulary segmentation and tracking in video
Meta's Segment Anything Model 3 adds open-vocabulary segmentation with text and exemplar prompts, letting you click or type to segment and track any object across images and video. The panel demoed it live on golden retriever videos, and it ships openly as part of Meta's open-source push.
SAM 3D turns single photos into 3D objects and human bodies
Released alongside SAM 3, SAM 3D reconstructs 3D objects and full human bodies from a single image with surprisingly high quality. It extends the Segment Anything family from 2D segmentation into single-image 3D reconstruction.
GPT-5.1-Codex-Max runs 24-hour coding tasks with native compaction
OpenAI's newest frontier agentic coding model is trained with native compaction, letting it intelligently summarize prior context and work on a single task for 24+ hours (an internal run reportedly lasted a full week). It uses 30% fewer thinking tokens at median than its predecessors and sets a new SOTA of 58% on TerminalBench 2, also leading on SWE-Bench and SWE-Lancer. Windows PowerShell support is significantly improved, alongside an experimental Windows sandbox and a new extra-high reasoning level.
58% TerminalBench 2 (new SOTA)24h+ Single-task agent run time via native compaction30% Fewer thinking tokens at median
Sunday Robotics unveils ACT-1 home robot foundation model and Memo
Sunday Robotics introduced ACT-1, a home robot foundation model, alongside its Memo robot. Instead of $20K teleoperation rigs, training data comes from a $200 skill glove, and the model handles long-horizon household tasks with solid zero-shot generalization.
$200 Skill glove used for data collection vs $20K teleop rigs
Grok 4.1 briefly tops LM Arena with major post-training upgrade
xAI's Grok 4.1 shipped in November alongside GPT-5.1 and Claude Opus 4.5 in the year's most concentrated stretch of frontier releases. Yam highlighted the week-and-a-half window as emblematic of 2025's relentless acceleration.
Qwen Image Edit gains Multi-Angle LoRA for camera control
A Multi-Angle LoRA for Qwen Image Edit landed, enabling camera-control style edits that re-render a scene from new angles. Available as a Hugging Face space and on fal, it shows the fast-moving open ecosystem building on Qwen's image editing models.
Baidu open-sources ERNIE-4.5-VL-28B-A3B-Thinking visual reasoning model
Baidu released ERNIE-4.5-VL-28B-A3B-Thinking, an Apache 2.0 open-weights visual reasoning MoE with only 3B active parameters that claims to rival much larger models like GPT-5 High on vision tasks. It features image zooming, spatial grounding, and reasoning, with strong small-model performance attributed to GSPO training from the Qwen team.
ElevenLabs launches Scribe v2 Realtime speech-to-text with 150ms latency
ElevenLabs launched Scribe v2 Realtime, a streaming speech-to-text model with roughly 150ms latency and support for over 90 languages, demoed live by Paul Asjes. It auto-switches languages mid-stream and handles code, initialisms, and technical terms with context-aware transcription, outpacing Whisper on speed and accuracy.
H Company open-sources Holo2 multimodal computer-use agent family
Dropped live during the show: H Company open-sourced Holo2, a next-generation multimodal agent family fine-tuned on Qwen3-VL for grounding, navigation, and reasoning across web, desktop, and mobile. It posts SOTA results on computer-use and web-navigation benchmarks like OSWorld-G and ships in 4B, 8B, and 30B variants under Apache 2.0.
Meta releases Omnilingual ASR covering 1,600+ languages
Meta released Omnilingual ASR, an Apache 2.0 speech recognition family supporting over 1,600 languages, including 500+ never before served by any ASR system, with character error rate under 10% for 78 languages. The release includes an open corpus of 500k+ rows of transcribed audio, and the 1B model was praised as a near drop-in state-of-the-art replacement on Hugging Face.
NVIDIA released an Upscaler LoRA for its ChronoEdit-14B image editing model, available on Hugging Face with Diffusers pipeline support. It adds high-quality upscaling to the ChronoEdit physics-aware editing stack.
OpenAI launches GPT-5.1 with a warmer, more personable voice
OpenAI shipped GPT-5.1, an update to its flagship model focused on a warmer tone and personality upgrades. The panel discussed how the friendlier default voice changes day-to-day ChatGPT use and what it signals for the frontier model race.
WeiboAI releases VibeThinker-1.5B open reasoning model
Weibo's AI team open-sourced VibeThinker-1.5B, a tiny reasoning model that reportedly outperforms much larger models like DeepSeek R1 on select reasoning benchmarks. Part of a week where small open-weights models from Chinese labs kept punching above their weight.
Ai2 launches OlmoEarth foundation models and open Earth-intelligence platform
Ai2 launched OlmoEarth, a family of foundation models plus an open, end-to-end platform for fast, high-resolution Earth intelligence. It applies the lab's open-model approach to geospatial and remote-sensing data, making Earth observation workloads accessible without proprietary stacks.
Inworld TTS takes the #1 spot on the Artificial Analysis speech benchmark
Inworld released a new version of its TTS model that claimed the #1 position on the Artificial Analysis text-to-speech benchmark. It featured in the episode's voice segment as evidence that commercial TTS quality keeps climbing fast.
Maya-1 open-source voice generation model released
Maya-1 is a new open-source voice generation model that was demoed on the show as part of the week's voice AI wave. The panel highlighted how quickly open voice model quality is improving, with expressive output that holds up against commercial systems.
Meituan releases LongCat Flash Omni, a 560B (27B active) omni model
Meituan's LongCat team released LongCat Flash Omni, a 560B-parameter mixture-of-experts model with roughly 27B active parameters that accepts text, audio, and video input. It extends the open LongCat Flash line into omni-modal territory from a lab better known for food delivery than frontier models.
Moonshot AI releases Kimi K2 Thinking, an open 1T-param reasoning MoE
Moonshot AI released Kimi K2 Thinking, an open-source 1-trillion-parameter mixture-of-experts reasoning agent with 256K context and large-scale tool-calling capacity. The panel treated it as the open-source centerpiece of the week, focusing on its reasoning quality and coding utility rather than just benchmark screenshots, and as a sign open models keep closing the usability gap with frontier closed models.
LTX Studio's Retake brings Photoshop-style object editing to video
LTX Studio launched Retake, an AI video editing tool that enables inpainting-style editing of specific objects within video frames. Wolfram called it 'the image editing moment for video' — Photoshop for video, available to try on Replicate.
W&B launches Serverless LoRA Inference on CoreWeave
Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.
Sandbar launches Stream voice assistant and Stream Ring wearable
Sandbar launched Stream, a voice-first personal assistant, alongside Stream Ring, a wearable described as a 'mouse for voice' that is now available for preorder. The pairing pushes always-available voice interaction into dedicated hardware rather than the phone.
XPeng unveils 'Iron' humanoid robot with soft skin and 2026 production plan
XPeng unveiled Iron, a humanoid robot it claims has the most human-like design yet, featuring soft skin, bionic muscles, and a VLT (vision-language-task) brain. The company says it plans to put Iron into production in 2026, putting a Chinese EV maker squarely in the humanoid race.
OpenAI integrates ChatGPT Voice Mode directly into chats
OpenAI integrated ChatGPT's Voice Mode directly into the chat interface instead of a separate full-screen experience. Users can now talk to ChatGPT while seeing transcripts and visual responses inline in the conversation.
GPT-5.1 Pro: research-grade deep-thinking mode in ChatGPT
OpenAI also shipped GPT-5.1 Pro, a new research-grade ChatGPT mode that will happily think for minutes on a single query. It targets hard research-style questions where extended deliberation pays off, rounding out OpenAI's big week alongside Codex-Max.
Google rolled out an upgrade to Gemini Live's voice capabilities, making conversations more natural. Covered in the big-companies roundup alongside GPT-5.1 and Grok 4 Fast as the voice interface race heats up.
Grok 4 Fast expands to a 2 million token context window
xAI's Grok 4 Fast now supports a 2 million token context window, one of the largest of any frontier model. The crew called the jump 'crazy' and discussed what such long context unlocks for agentic and document-heavy workloads.
Cursor added an in-IDE browser, letting developers preview and interact with their running app without leaving the editor. The panel called out how performant the implementation is, tightening the loop between agentic code edits and visual verification.
Windsurf ships Codemaps, AI-annotated navigable maps of your codebase
Cognition's Windsurf launched Codemaps, AI-annotated and navigable maps of a codebase powered by SWE-1.5 for fast mode and Claude Sonnet 4.5 for smart mode. It aims to help developers and agents build a structural understanding of large repos instead of navigating file by file.
Grok 4.1 Fast: 2M context and Agent Tools API at 10x lower cost
Launched as breaking news during the show, Grok 4.1 Fast pairs a 2 million token context window with a new Agent Tools API offering native X search, Reddit search, web browsing, and code execution. Benchmarks are striking: 93-100% on tau2-Bench Telecom and 72% on Berkeley Function Calling v4 (top of the leaderboard) at $0.20/$0.50 per million tokens — roughly 10x cheaper than competitors, and free for the first two weeks on the xAI API and OpenRouter.
93–100% τ²-Bench Telecom72% Berkeley Function Calling v42M Token context window
Antigravity: Google's free agent-first IDE powered by Gemini 3 Pro
A free VS Code fork reimagined for agent-first coding, with an inbox-style Agent Manager for running multiple coding agents in parallel across a codebase. Browser integration lets agents control Chrome, take screenshots and videos of the running app, and self-debug. The free tier is powered by Gemini 3 Pro, with GPT-OSS 120B as the open-source alternative and Nano Banana for images.
Marimo ships reactive Python notebooks extension for VS Code and Cursor
Marimo released a new VS Code and Cursor extension bringing its reactive Python notebooks directly into the editor, with UV integration for dependency management. It was highlighted in the open-source roundup as a notable dev-tool release of the week.
W&B ships LEET, an open-source terminal UI for monitoring ML runs
Weights & Biases released LEET (Lightweight Experiment Exploration Tool), an open-source terminal-native dashboard for tracking ML runs, demoed live by Dima Duev of the SDK team. It works fully offline for air-gapped HPC clusters and brings real-time metrics, system stats, and zoomable interactive charts to the terminal.
Project AELLA publishes 100K LLM-generated research paper summaries
Project AELLA (also called OSSAS) released 100,000 LLM-generated structured summaries of scientific papers, published openly on Hugging Face. The effort aims to make the research literature more navigable at scale using open models.
Terminal-Bench 2.0 and Harbor launch as new bar for coding agents
Terminal-Bench 2.0 launched alongside the Harbor framework, with 89 hard, realistic terminal-based tasks built with around 1000 Discord contributors. The Warp agent tops the leaderboard at 50% with Codex CLI close behind, and the panel argued an unsaturated 50% ceiling makes it far more meaningful than near-saturated benchmarks like MMLU.
LMArena launches Code Arena for live agentic coding evaluations
LMArena launched Code Arena, a live evaluation platform where models build real applications agentically and humans vote on the results. It extends the arena-style crowdsourced ranking approach to agentic coding workflows.
MCP-UI becomes MCP Apps, an official standard from Anthropic + OpenAI
MCP-UI, created by Ido Salomon and Liad Yosef, was standardized as 'MCP Apps' — an official MCP extension jointly adopted by Anthropic and OpenAI that unifies MCP-UI with what OpenAI called Operator Plugins. Agents can now render full interactive HTML UIs directly inside chat, avoiding iOS-vs-Android style fragmentation with one open standard.
Anthropic publishes code-execution-with-MCP pattern for token-efficient agents
Anthropic published an engineering post showing how running MCP-connected tools as code, instead of direct tool calls, slashes token use and scales agents to many more tools. The approach echoes Cloudflare's Code Mode and framed the episode's interview with Kenton Varda about agents writing code against tool APIs.
AWS announces multi-year strategic infrastructure partnership with OpenAI
AWS announced a multi-year strategic infrastructure partnership with OpenAI to power ChatGPT inference, training, and agentic AI workloads. It is another sign of OpenAI spreading its compute needs across every major cloud provider, and a notable win for AWS in the frontier-AI infrastructure race.
Hugging Face publishes the Smol Training Playbook for LLM pretraining
Hugging Face published the Smol Training Playbook, a 200+ page end-to-end guide to reliably pretraining and operating LLMs. It distills the team's practical experience from the SmolLM line into an open resource for anyone training their own models.