Everything AI Released in December 2025

58 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

🧠 New Models 41

Alibaba (Qwen)
New ModelsOpen weights

Qwen 3 Coder

Qwen 3 Coder posts insane scores in the race for the coding crown

Alibaba's Qwen 3 Coder landed in July with what the crew called insane benchmark scores for an open-weights coding model. Together with Kimi K2 and GLM 4.5 it made July the peak month for Chinese open source.

Alibaba (Qwen)
New ModelsOpen weights

Qwen speech-to-speech model

Qwen launches speech-to-speech model with emotion handling

Qwen released a speech-to-speech model in March with internal emotion handling, joining the wave of voice-native models. It was part of the Qwen team's relentless 2025 release cadence across modalities.

Anthropic
New Models

Claude Opus 4

Claude Opus 4 drops in Q2 — Ryan's pick for best model ever

Claude Opus 4 launched in Q2 and became Ryan Carson's pick as the best coding model he had used in over 700 days of daily LLM coding. It cemented Anthropic's lead in agentic coding through the middle of the year.

Black Forest Labs
New Models

Flux 3

Flux 3 becomes the new gold standard for image generation

Flux 3 dropped in August and immediately became the gold standard for image generation, landing three years almost to the day after Stable Diffusion first went public. Wolfram used it as the yardstick for how far image AI traveled in those three years.

Daily (Pipecat)
New ModelsOpen weights

Smart Turn Detection

Daily ships smart turn detection for voice agents

Kwindla's Daily.co shipped smart turn detection during Q2, an open model that helps voice agents know when a speaker has actually finished talking. It landed in the quarter when voice agents first got attention outside the builder bubble.

DeepSeek
New ModelsOpen weights

DeepSeek R1

DeepSeek R1: the open reasoning model that crashed NVIDIA's stock

DeepSeek's open-weights reasoning model dropped January 23rd and matched OpenAI's o1 at roughly 50x cheaper pricing, with an alleged training cost of just $5.5M. It crashed NVIDIA stock 17% — a $560B single-day loss, the largest single-company monetary loss in history — and made Chinese AI a household topic. The crew named it the earthquake that shattered assumptions about who leads AI.

$560B NVIDIA stock loss$5.5M DeepSeek R1 training cost
DeepSeek
New ModelsOpen weights

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus lands amid September's relentless pace

DeepSeek resurfaced in September with V3.1 Terminus, another strong open-weights release that arrived just as the crew was barely keeping up with the weekly firehose. Nisten noted that missing a single week in this period left you completely lost.

Google DeepMind
New Models

Gemini TTS

Google ships a Gemini TTS model in its December run

As part of Google's December release wave, a Gemini TTS model shipped alongside realtime model updates. It rounded out Google's full-stack voice story heading into 2026.

Google DeepMind
New Models

VEO3

VEO3: native audio video generation crosses the uncanny valley

Google's VEO3 stunned everyone in Q2 with video generation that included native audio, which the crew credits with crossing the uncanny valley for AI video. It was a centerpiece of Google IO 2025 and of Google's comeback year.

Hexgrad (Kokoro)
New ModelsOpen weights

Kokoro TTS

Kokoro TTS: 82M-param Apache 2 model hits #1 on TTS Arena

Kokoro, a tiny 82M parameter text-to-speech model, went viral in January after hitting #1 on TTS Arena. Released under Apache 2.0 and small enough to run in the browser, it showed that high-quality speech synthesis no longer required huge models.

MiniMax (Hailuo)
New Models

Hailuo 2.3

MiniMax drops Hailuo 2.3 in November

MiniMax released Hailuo 2.3 (referred to as 'Hailuo LLM 2.3' on the show) in November, cited as another strong release from the Chinese labs. It closed out a year in which MiniMax shipped everything from 4M-context LLMs to media models.

Moonshot AI (Kimi)
New ModelsOpen weights

Kimi K2

Kimi K2: the Chinese open model that earned mainstream respect

Moonshot AI's Kimi K2 dropped in July and earned serious mainstream recognition, marking peak Chinese-lab dominance of open source. It was named in the show's TL;DR as one of the defining open-weights releases of 2025.

OpenAI
New Models

GPT-5 Codex

GPT-5 Codex: OpenAI's specialized coding model moves the stock

GPT-5 Codex dropped in September as OpenAI's coding-specialized fine-tune of GPT-5. Yam dubbed it the 'infinite money glitch' because the release moved OpenAI-linked stock prices significantly.

OpenAI
New Models

Sora 2

Sora 2 democratizes video generation and floods the internet with memes

Sora 2 opened Q4 in October by democratizing video generation, complete with a social platform, and spawned a wave of memes still circulating at year's end. The show's TL;DR credits it as part of 2025 crossing the uncanny valley for AI media.

OpenAI
New Models

New voice models (GPT Realtime derivatives)

OpenAI ships two new voice models derived from GPT Realtime

In March, OpenAI released two voice models derived from its GPT Realtime speech-to-speech stack. They were part of a wave that pushed voice agents toward the mainstream over the course of 2025.

Tencent (Hunyuan)
New ModelsOpen weights

Hunyuan open weights

Tencent enters the open weights race

In July, Tencent's Hunyuan team (rendered as 'HO One' in the episode) joined Huawei in entering the open-weights model race. It widened the field of Chinese labs shipping serious open models beyond DeepSeek, Qwen, and Moonshot.

Zhipu AI (GLM)
New ModelsOpen weights

GLM 4.5

GLM 4.5 runs on Cerebras fast enough to win hackathons

Zhipu's GLM 4.5 came out in July and was the first open model that ran on Cerebras hardware fast enough that hackathon competitors were winning with it. It set up GLM's quiet rise as a business workhorse later in the year.

Zhipu AI (GLM)
New ModelsOpen weights

GLM 4.6

GLM 4.6 quietly becomes the model businesses actually use

Zhipu's GLM 4.6 arrived in October and, per Nisten, quietly became a go-to model that many businesses still run today. It continued GLM's trajectory from hackathon favorite to production workhorse.

Google DeepMind
New ModelsOpen weights

FunctionGemma

FunctionGemma: Google's 270M function-calling model for edge agents

Google released FunctionGemma, a tiny 270M-parameter open model specialized for function calling on-device. With a roughly 500MB RAM footprint and strong gains after fine-tuning for mobile actions, it points toward privacy-first local agents on constrained hardware.

Google DeepMind
New Models

Gemini 3 Flash

Gemini 3 Flash delivers frontier intelligence at $0.50/1M input tokens

Google launched Gemini 3 Flash, offering frontier-tier capability at flash-tier pricing of $0.50 per million input tokens. It scores 78% on SWE-bench Verified, beating larger models on some agentic tasks, and supports tool-calling at scale with up to 100 simultaneous function calls.

$0.50 per 1M Gemini 3 Flash input tokens78% SWE-bench Verified
NVIDIA
New ModelsOpen weights

Nemotron 3 Nano

NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes

NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.

30B (3B active) Nemotron 3 Nano parameters
OpenAI
New Models

GPT 5.2 Codex

GPT 5.2 Codex drops live during the show with 400K context

OpenAI released GPT 5.2 Codex via API after months of exclusivity in the Codex app, making it available in Cursor, GitHub Copilot, and VS Code with native context compaction for long sessions. Cursor showcased it by building a complete browser from scratch in Rust, roughly 3 million lines of code across about 330,000 commits, driven by hundreds of concurrent agents.

56.4% SWE-Bench Pro64% Terminal-Bench 2.0
OpenAI
New Models

GPT Image 1.5

OpenAI GPT Image 1.5: 4x faster, 20% cheaper, #1 on LMSYS Image Arena

OpenAI released GPT Image 1.5, an upgraded image generation model that is 4x faster and 20% cheaper than its predecessor. It debuted at #1 on the LMSYS Image Arena leaderboard, part of OpenAI's rapid-fire release week.

Resemble AI
New ModelsOpen weights

Chatterbox Turbo

Resemble AI open-sources Chatterbox Turbo, a 350M MIT-licensed TTS

Resemble AI released Chatterbox Turbo, an MIT-licensed 350M-parameter open text-to-speech model. The company claims it beats ElevenLabs in blind listening tests, pushing high-quality TTS into fully open, accessible territory.

Amazon
New Models

Amazon Nova 2

Amazon announces Nova 2 family: Lite, Pro, Sonic, and Omni

Amazon rolled out the Nova 2 model suite spanning text, speech, and multimodal stacks with Lite, Pro, Sonic, and Omni variants. The launch came with major benchmark jumps over the first Nova generation and includes a fast, cost-effective reasoning model in Nova 2 Lite.

Arcee AI
New ModelsOpen weights

Arcee Trinity

Arcee Trinity launches US-trained open MoE family

Arcee AI introduced Trinity, a family of US-trained open mixture-of-experts models built from scratch, starting with Trinity-Mini and Trinity-Nano-Preview. CTO Lukas Atkins joined the show to discuss the training approach and previewed Trinity-Large for January 2026. The release positions Arcee as a domestic alternative in an open-weights field dominated by Chinese labs.

DeepSeek
New ModelsOpen weights

DeepSeek V3.2 / V3.2-Speciale

DeepSeek V3.2 and V3.2-Speciale post gold-medal reasoning under MIT license

DeepSeek released V3.2 and the reasoning-first V3.2-Speciale, a 685B-parameter MoE under MIT license. Speciale posted gold-medal-level olympiad results and 96% on AIME (versus GPT-5 High at 94%), with V3.2 hitting 73.1% on SWE-Bench Verified. Aggressive pricing around 28 cents per 1M tokens on OpenRouter pushes open models closer to top closed-model capability.

96% AIME73.1% SWE-Bench Verified685B Total parameters (MoE)
Microsoft
New ModelsOpen weights

VibeVoice-Realtime-0.5B

Microsoft shares VibeVoice-Realtime-0.5B with ~300ms latency TTS

Microsoft published VibeVoice-Realtime-0.5B on Hugging Face, a small realtime text-to-speech model claiming roughly 300ms latency. The show framed it as more evidence that sub-second audio response is becoming table stakes for production voice agents.

~300ms Claimed TTS latency0.5B Parameters
Mistral AI
New ModelsOpen weights

Mistral 3 (Large 3 + Ministral 3)

Mistral returns to Apache 2.0 with Mistral Large 3 and Ministral 3

Mistral relaunched its model family under permissive Apache 2.0 licensing with Mistral Large 3 and the small Ministral 3 edge models. Large 3 ships a 256K context window and strong open-model coding positioning. The licensing shift reignited discussion around open model portability and deployability.

256K Mistral Large 3 context window
Runway
New Models

Runway Gen-4.5

Runway Gen-4.5 takes #1 on the text-to-video leaderboard

Runway's Gen-4.5 video model climbed to the top of the text-to-video leaderboard with a 1,247 Elo rating. The result continued the weekly theme of video generation quality and multimodal consistency improving fast.

1,247 Text-to-video leaderboard Elo

🚀 Products & Apps 7

Cursor
Products & Apps

Cursor 2 + Composer

Cursor 2 and the Composer model level up IDE agents

Cursor shipped Cursor 2 along with its Composer model in October, leveling up in-IDE agentic coding. It capped a year in which Cursor's sales exploded on the back of Claude 3.7 and the vibe coding wave.

NVIDIA
Products & Apps

Project Digits

NVIDIA Project Digits: $3,000 desktop that runs 200B-param models

NVIDIA announced Project Digits in January, a $3,000 desktop supercomputer capable of running 200B parameter models locally. It brought serious local-inference hardware to individual developers and was one of January's standout hardware stories.

OpenAI
Products & Apps

Deep Research

OpenAI Deep Research scores 26.6% on Humanity's Last Exam

OpenAI's Deep Research launched in February as an agentic research tool that scored 26.6% on Humanity's Last Exam, versus roughly 10% for o1 and R1. The crew called it a jaw-dropping leap in AI research capability and one of February's defining releases.

26.6% HLE (Humanity's Last Exam)
OpenAI
Products & Apps

Operator

OpenAI Operator: first agentic ChatGPT with browser control

OpenAI launched Operator in January as the first agentic version of ChatGPT that could control a browser to complete tasks on the user's behalf. It kicked off the year-of-agents narrative, though it launched within 24 hours of DeepSeek R1 and was completely overshadowed by it.

Reve
Products & Apps

Reve image platform

Reve ships a 4-in-1 image creation and editing platform

Reve (rendered as 'RevA' in the episode) emerged in September as a four-in-one image creation and editing platform. Alex said he still uses it daily, making it one of the year's sleeper product hits.

OpenAI
Products & Apps

ChatGPT App Store

ChatGPT App Store opens submissions via MCP app model

OpenAI opened app submissions for the ChatGPT App Store, built on the MCP-powered apps model. Developers can now submit apps that run inside ChatGPT, signaling OpenAI's platform play for distribution of agentic apps.

Weights & Biases
Products & Apps

LLM Evaluation Jobs

W&B launches LLM Evaluation Jobs for OpenAI-compatible APIs

Weights & Biases launched LLM Evaluation Jobs, letting teams run evaluations against any OpenAI-compatible API during training cycles instead of only at the end. The show framed it as a practical workflow upgrade for getting earlier model quality signals without blindly burning compute.

✨ Major Features & Updates 6

Anthropic
Major Features & Updates

Claude Skills

Claude Skills launches — 'MCP-level if not bigger'

Anthropic launched Claude Skills in October. It was largely missed at release but picked up steam fast, with the show arguing Skills is 'MCP level if not bigger' for Claude users as a way to package reusable agent capabilities.

OpenAI
Major Features & Updates

ChatGPT Memory

ChatGPT gets persistent memory across conversations

In Q2, OpenAI shipped memory for ChatGPT, letting the assistant carry context across all of a user's past conversations. It was one of the quarter's notable product-layer upgrades alongside native image generation.

OpenAI
Major Features & Updates

GPT-4o native image generation

GPT-4o native image generation sparks Ghibli-mania

OpenAI shipped native image generation in GPT-4o, producing the viral Ghibli-style image wave and bringing AI image creation to the ChatGPT mainstream. Wolfram cited the 2025 paradigm shift in image generation as his release of the year.

Windsurf
Major Features & Updates

Code Maps

Windsurf Code Maps generates flowcharts of entire codebases

Windsurf released Code Maps in November, a feature that generates flowchart-style maps of entire codebases. It was one of the quieter but practical dev-tool releases in a month dominated by frontier model drops.

Google DeepMind
Major Features & Updates

Gemini 3 Deep Think

Gemini 3 Deep Think hits 45.1% on ARC-AGI-2 with parallel reasoning

Google shipped Deep Think, a high-cost parallel reasoning mode for Gemini 3 that scored 45.1% on ARC-AGI-2. The panel framed it as Google pressing its advantage in the frontier race, where product integration and latency now matter as much as raw benchmark IQ.

45.1% ARC-AGI-2

🔌 APIs & Platforms 1

xAI
APIs & Platforms

Grok Voice Agent API

xAI Grok Voice Agent API ships at $0.05/min flat rate, powers Tesla

xAI launched the Grok Voice Agent API with flat-rate pricing of $0.05 per minute and integration into Tesla vehicles. xAI claims the #1 spot on Big Bench Audio at 92.3%, tightening competition in the rapidly commoditizing real-time voice stack.

$0.05/min Grok Voice Agent API

🛠️ Dev Tools 1

Anthropic
Dev Tools

Claude Code

Claude Code launches, starting the CLI agent revolution

Claude Code launched in February, having started as an internal Anthropic engineering tool. Multiple co-hosts picked it as the single most impactful AI release of 2025 — it began the CLI agent era and proved, in Kwindla's words, that 'sometimes it's mostly about the harness.'

💰 Funding 2

Thinking Machines Lab
Funding

Thinking Machines Lab

Thinking Machines Lab launches with a billion-dollar round

Around June, news broke that Mira Murati's Thinking Machines Lab raised its first billion-dollar round, pulling in what LDJ described as 'an absolute avalanche of top tier researchers' from OpenAI and other labs. It was one of the year's biggest talent and funding stories.