Everything AI Released in November 2025

50 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

🧠 New Models 29

Alibaba (Tongyi)
New ModelsOpen weights

Z-Image Turbo

Tongyi's Z-Image Turbo brings sub-second open image generation

Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.

6B Parameters
Anthropic
New Models

Claude Opus 4.5

Anthropic launches Claude Opus 4.5, reclaiming the coding crown

Anthropic released Claude Opus 4.5, scoring 80.9% on SWE-bench Verified to top GPT-5.1 (77.9%) and Gemini 3 Pro (76.2%). It adds a new 'Effort' parameter for compute control, Tool Search to cut agent token overhead, and Programmatic Tool Calling where the model writes and executes code loops. Pricing dropped to $5/M input and $25/M output, roughly one-third the old Opus price.

80.9% SWE-bench Verified$5/M Input token price$25/M Output token price
Black Forest Labs
New ModelsOpen weights

FLUX.2

Black Forest Labs releases FLUX.2, a 32B multi-reference image model

Black Forest Labs released FLUX.2, a 32B-parameter image model with open weights (FLUX.2-dev) that supports multi-reference image editing. It lets users combine multiple reference images and prompt edits with variables, a step up in controllable image editing.

32B Parameters
Microsoft
New ModelsOpen weights

Fara-7B

Microsoft ships Fara-7B, a 7B on-device computer use agent

Microsoft Research released Fara-7B, a best-in-class 7B-parameter vision-language model for computer use that runs on-device. It scores 73.5% on WebVoyager, beating OpenAI's computer-use preview while being small enough to run locally.

73.5% WebVoyager
Prime Intellect
New ModelsOpen weights

INTELLECT-3

Prime Intellect releases INTELLECT-3, a 106B open MoE model

Prime Intellect released INTELLECT-3, a 106B-parameter mixture-of-experts model with 12B active parameters that scores 90% on AIME 2024/2025. The lab fully open-sourced the training stack alongside the weights, showing a small lab can train frontier-scale models.

106B Total parameters (12B active)90% AIME 2024/2025
Tencent (Hunyuan)
New ModelsOpen weights

HunyuanOCR

Tencent's 1B HunyuanOCR beats 72B models on OCRBench

Tencent released HunyuanOCR, a 1B-parameter OCR model that scores 860 on OCRBench, beating models as large as Qwen3-VL-72B. It is a striking example of task-specialized small models outperforming generalist giants.

1B Parameters860 OCRBench score
Tencent (Hunyuan)
New ModelsOpen weights

HunyuanVideo 1.5

Tencent releases HunyuanVideo 1.5, a lightweight open video model

Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.

New ModelsOpen weights

OLMo 3

OLMo 3: Allen AI's fully open 32B model with complete recipe

Allen AI released OLMo 3, a fully open 32B dense model where the dataset, training recipe, and hyperparameters are all public — not just the weights. LDJ contrasted it with open-weights-only releases from Qwen and DeepSeek, which have never published a fully open recipe.

32B Dense parameters, fully open dataset and recipe
Google DeepMind
New Models

Gemini 3 Pro

Gemini 3 Pro launches with record ARC-AGI-2 scores

Google's new frontier multimodal model with a 1M-token context window and huge reasoning gains, scoring 31.11% on ARC-AGI-2 (45.14% with Deep Think mode) — roughly double the previous SOTA — plus 81% on MMLU-Pro and major coding improvements. Amp switched to it as their default model on launch day, the first time they have ever switched defaults. Also rolling out across Gmail, Calendar, and AI Mode in Google Search.

45.14% ARC-AGI-2 (Deep Think)31.11% ARC-AGI-2 (standard)1M Token context window
Google DeepMind
New Models

Nano Banana Pro

Nano Banana Pro generates 4K images with perfect text

Google's upgraded image model dropped as breaking news mid-show, adding visible thinking traces, 4K resolution output, and SynthID watermarking with C2PA metadata. Alex demoed it live by one-shotting an 8MB AI-news infographic with flawless text and pixel-accurate logos across the entire image. It also powers generative UIs in Gemini, building interactive dashboards with real data on the fly.

4K First image model with flawless 4K output and perfect text
Meta AI
New ModelsOpen weights

SAM 3

Meta SAM 3: open-vocabulary segmentation and tracking in video

Meta's Segment Anything Model 3 adds open-vocabulary segmentation with text and exemplar prompts, letting you click or type to segment and track any object across images and video. The panel demoed it live on golden retriever videos, and it ships openly as part of Meta's open-source push.

Meta AI
New ModelsOpen weights

SAM 3D

SAM 3D turns single photos into 3D objects and human bodies

Released alongside SAM 3, SAM 3D reconstructs 3D objects and full human bodies from a single image with surprisingly high quality. It extends the Segment Anything family from 2D segmentation into single-image 3D reconstruction.

OpenAI
New Models

GPT-5.1-Codex-Max

GPT-5.1-Codex-Max runs 24-hour coding tasks with native compaction

OpenAI's newest frontier agentic coding model is trained with native compaction, letting it intelligently summarize prior context and work on a single task for 24+ hours (an internal run reportedly lasted a full week). It uses 30% fewer thinking tokens at median than its predecessors and sets a new SOTA of 58% on TerminalBench 2, also leading on SWE-Bench and SWE-Lancer. Windows PowerShell support is significantly improved, alongside an experimental Windows sandbox and a new extra-high reasoning level.

58% TerminalBench 2 (new SOTA)24h+ Single-task agent run time via native compaction30% Fewer thinking tokens at median
Sunday Robotics
New Models

ACT-1 & Memo

Sunday Robotics unveils ACT-1 home robot foundation model and Memo

Sunday Robotics introduced ACT-1, a home robot foundation model, alongside its Memo robot. Instead of $20K teleoperation rigs, training data comes from a $200 skill glove, and the model handles long-horizon household tasks with solid zero-shot generalization.

$200 Skill glove used for data collection vs $20K teleop rigs
xAI
New Models

Grok 4.1

Grok 4.1 briefly tops LM Arena with major post-training upgrade

xAI's Grok 4.1 shipped in November alongside GPT-5.1 and Claude Opus 4.5 in the year's most concentrated stretch of frontier releases. Yam highlighted the week-and-a-half window as emblematic of 2025's relentless acceleration.

1483 LM Arena Elo (briefly #1)
Alibaba (Qwen)
New ModelsOpen weights

Qwen Image Edit Multi-Angle LoRA

Qwen Image Edit gains Multi-Angle LoRA for camera control

A Multi-Angle LoRA for Qwen Image Edit landed, enabling camera-control style edits that re-render a scene from new angles. Available as a Hugging Face space and on fal, it shows the fast-moving open ecosystem building on Qwen's image editing models.

Baidu
New ModelsOpen weights

ERNIE-4.5-VL-28B-A3B-Thinking

Baidu open-sources ERNIE-4.5-VL-28B-A3B-Thinking visual reasoning model

Baidu released ERNIE-4.5-VL-28B-A3B-Thinking, an Apache 2.0 open-weights visual reasoning MoE with only 3B active parameters that claims to rival much larger models like GPT-5 High on vision tasks. It features image zooming, spatial grounding, and reasoning, with strong small-model performance attributed to GSPO training from the Qwen team.

3B Active Parameters
ElevenLabs
New Models

Scribe v2 Realtime

ElevenLabs launches Scribe v2 Realtime speech-to-text with 150ms latency

ElevenLabs launched Scribe v2 Realtime, a streaming speech-to-text model with roughly 150ms latency and support for over 90 languages, demoed live by Paul Asjes. It auto-switches languages mid-stream and handles code, initialisms, and technical terms with context-aware transcription, outpacing Whisper on speed and accuracy.

150ms Latency90+ Languages (Scribe)
H Company
New ModelsOpen weights

Holo2

H Company open-sources Holo2 multimodal computer-use agent family

Dropped live during the show: H Company open-sourced Holo2, a next-generation multimodal agent family fine-tuned on Qwen3-VL for grounding, navigation, and reasoning across web, desktop, and mobile. It posts SOTA results on computer-use and web-navigation benchmarks like OSWorld-G and ships in 4B, 8B, and 30B variants under Apache 2.0.

Meta AI
New ModelsOpen weights

Omnilingual ASR

Meta releases Omnilingual ASR covering 1,600+ languages

Meta released Omnilingual ASR, an Apache 2.0 speech recognition family supporting over 1,600 languages, including 500+ never before served by any ASR system, with character error rate under 10% for 78 languages. The release includes an open corpus of 500k+ rows of transcribed audio, and the 1B model was praised as a near drop-in state-of-the-art replacement on Hugging Face.

1600+ Languages Supported
WeiboAI
New ModelsOpen weights

VibeThinker-1.5B

WeiboAI releases VibeThinker-1.5B open reasoning model

Weibo's AI team open-sourced VibeThinker-1.5B, a tiny reasoning model that reportedly outperforms much larger models like DeepSeek R1 on select reasoning benchmarks. Part of a week where small open-weights models from Chinese labs kept punching above their weight.

New ModelsOpen weights

OlmoEarth

Ai2 launches OlmoEarth foundation models and open Earth-intelligence platform

Ai2 launched OlmoEarth, a family of foundation models plus an open, end-to-end platform for fast, high-resolution Earth intelligence. It applies the lab's open-model approach to geospatial and remote-sensing data, making Earth observation workloads accessible without proprietary stacks.

Inworld AI
New Models

Inworld TTS

Inworld TTS takes the #1 spot on the Artificial Analysis speech benchmark

Inworld released a new version of its TTS model that claimed the #1 position on the Artificial Analysis text-to-speech benchmark. It featured in the episode's voice segment as evidence that commercial TTS quality keeps climbing fast.

Maya Research
New ModelsOpen weights

Maya-1

Maya-1 open-source voice generation model released

Maya-1 is a new open-source voice generation model that was demoed on the show as part of the week's voice AI wave. The panel highlighted how quickly open voice model quality is improving, with expressive output that holds up against commercial systems.

Meituan (LongCat)
New ModelsOpen weights

LongCat Flash Omni

Meituan releases LongCat Flash Omni, a 560B (27B active) omni model

Meituan's LongCat team released LongCat Flash Omni, a 560B-parameter mixture-of-experts model with roughly 27B active parameters that accepts text, audio, and video input. It extends the open LongCat Flash line into omni-modal territory from a lab better known for food delivery than frontier models.

Moonshot AI
New ModelsOpen weights

Kimi K2 Thinking

Moonshot AI releases Kimi K2 Thinking, an open 1T-param reasoning MoE

Moonshot AI released Kimi K2 Thinking, an open-source 1-trillion-parameter mixture-of-experts reasoning agent with 256K context and large-scale tool-calling capacity. The panel treated it as the open-source centerpiece of the week, focusing on its reasoning quality and coding utility rather than just benchmark screenshots, and as a sign open models keep closing the usability gap with frontier closed models.

🚀 Products & Apps 4

Weights & Biases
Products & Apps

Serverless LoRA Inference

W&B launches Serverless LoRA Inference on CoreWeave

Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.

Sandbar
Products & Apps

Stream / Stream Ring

Sandbar launches Stream voice assistant and Stream Ring wearable

Sandbar launched Stream, a voice-first personal assistant, alongside Stream Ring, a wearable described as a 'mouse for voice' that is now available for preorder. The pairing pushes always-available voice interaction into dedicated hardware rather than the phone.

XPeng
Products & Apps

Iron

XPeng unveils 'Iron' humanoid robot with soft skin and 2026 production plan

XPeng unveiled Iron, a humanoid robot it claims has the most human-like design yet, featuring soft skin, bionic muscles, and a VLT (vision-language-task) brain. The company says it plans to put Iron into production in 2026, putting a Chinese EV maker squarely in the humanoid race.

✨ Major Features & Updates 6

OpenAI
Major Features & Updates

GPT-5.1 Pro

GPT-5.1 Pro: research-grade deep-thinking mode in ChatGPT

OpenAI also shipped GPT-5.1 Pro, a new research-grade ChatGPT mode that will happily think for minutes on a single query. It targets hard research-style questions where extended deliberation pays off, rounding out OpenAI's big week alongside Codex-Max.

Cursor
Major Features & Updates

Cursor in-IDE browser

Cursor adds a built-in browser inside the IDE

Cursor added an in-IDE browser, letting developers preview and interact with their running app without leaving the editor. The panel called out how performant the implementation is, tightening the loop between agentic code edits and visual verification.

Windsurf (Cognition)
Major Features & Updates

Codemaps

Windsurf ships Codemaps, AI-annotated navigable maps of your codebase

Cognition's Windsurf launched Codemaps, AI-annotated and navigable maps of a codebase powered by SWE-1.5 for fast mode and Claude Sonnet 4.5 for smart mode. It aims to help developers and agents build a structural understanding of large repos instead of navigating file by file.

🔌 APIs & Platforms 1

xAI
APIs & Platforms

Grok 4.1 Fast + Agent Tools API

Grok 4.1 Fast: 2M context and Agent Tools API at 10x lower cost

Launched as breaking news during the show, Grok 4.1 Fast pairs a 2 million token context window with a new Agent Tools API offering native X search, Reddit search, web browsing, and code execution. Benchmarks are striking: 93-100% on tau2-Bench Telecom and 72% on Berkeley Function Calling v4 (top of the leaderboard) at $0.20/$0.50 per million tokens — roughly 10x cheaper than competitors, and free for the first two weeks on the xAI API and OpenRouter.

93–100% τ²-Bench Telecom72% Berkeley Function Calling v42M Token context window

🛠️ Dev Tools 3

Google DeepMind
Dev Tools

Antigravity

Antigravity: Google's free agent-first IDE powered by Gemini 3 Pro

A free VS Code fork reimagined for agent-first coding, with an inbox-style Agent Manager for running multiple coding agents in parallel across a codebase. Browser integration lets agents control Chrome, take screenshots and videos of the running app, and self-debug. The free tier is powered by Gemini 3 Pro, with GPT-OSS 120B as the open-source alternative and Nano Banana for images.

Marimo
Dev ToolsOpen weights

Marimo VS Code / Cursor extension

Marimo ships reactive Python notebooks extension for VS Code and Cursor

Marimo released a new VS Code and Cursor extension bringing its reactive Python notebooks directly into the editor, with UV integration for dependency management. It was highlighted in the open-source roundup as a notable dev-tool release of the week.

Weights & Biases
Dev ToolsOpen weights

W&B LEET

W&B ships LEET, an open-source terminal UI for monitoring ML runs

Weights & Biases released LEET (Lightweight Experiment Exploration Tool), an open-source terminal-native dashboard for tracking ML runs, demoed live by Dima Duev of the SDK team. It works fully offline for air-gapped HPC clusters and brings real-time metrics, system stats, and zoomable interactive charts to the terminal.

📦 Datasets 1

📊 Benchmarks & Evals 2

Benchmarks & EvalsOpen weights

Terminal-Bench 2.0

Terminal-Bench 2.0 and Harbor launch as new bar for coding agents

Terminal-Bench 2.0 launched alongside the Harbor framework, with 89 hard, realistic terminal-based tasks built with around 1000 Discord contributors. The Warp agent tops the leaderboard at 50% with Codex CLI close behind, and the panel argued an unsaturated 50% ceiling makes it far more meaningful than near-saturated benchmarks like MMLU.

50% Terminal Bench v2 Top Score

🌀 Also Released 4

Also ReleasedOpen weights

MCP Apps

MCP-UI becomes MCP Apps, an official standard from Anthropic + OpenAI

MCP-UI, created by Ido Salomon and Liad Yosef, was standardized as 'MCP Apps' — an official MCP extension jointly adopted by Anthropic and OpenAI that unifies MCP-UI with what OpenAI called Operator Plugins. Agents can now render full interactive HTML UIs directly inside chat, avoiding iOS-vs-Android style fragmentation with one open standard.

Anthropic
Also Released

Code execution with MCP

Anthropic publishes code-execution-with-MCP pattern for token-efficient agents

Anthropic published an engineering post showing how running MCP-connected tools as code, instead of direct tool calls, slashes token use and scales agents to many more tools. The approach echoes Cloudflare's Code Mode and framed the episode's interview with Kenton Varda about agents writing code against tool APIs.

Amazon Web Services
Also Released

AWS-OpenAI infrastructure partnership

AWS announces multi-year strategic infrastructure partnership with OpenAI

AWS announced a multi-year strategic infrastructure partnership with OpenAI to power ChatGPT inference, training, and agentic AI workloads. It is another sign of OpenAI spreading its compute needs across every major cloud provider, and a notable win for AWS in the frontier-AI infrastructure race.

Hugging Face
Also ReleasedOpen weights

Smol Training Playbook

Hugging Face publishes the Smol Training Playbook for LLM pretraining

Hugging Face published the Smol Training Playbook, a 200+ page end-to-end guide to reliably pretraining and operating LLMs. It distills the team's practical experience from the SmolLM line into an open resource for anyone training their own models.