New ModelsOpen weights
Qwen 3 Coder
Qwen 3 Coder posts insane scores in the race for the coding crown
Alibaba's Qwen 3 Coder landed in July with what the crew called insane benchmark scores for an open-weights coding model. Together with Kimi K2 and GLM 4.5 it made July the peak month for Chinese open source.
New ModelsOpen weights
Qwen speech-to-speech model
Qwen launches speech-to-speech model with emotion handling
Qwen released a speech-to-speech model in March with internal emotion handling, joining the wave of voice-native models. It was part of the Qwen team's relentless 2025 release cadence across modalities.
New Models
Claude Opus 4
Claude Opus 4 drops in Q2 — Ryan's pick for best model ever
Claude Opus 4 launched in Q2 and became Ryan Carson's pick as the best coding model he had used in over 700 days of daily LLM coding. It cemented Anthropic's lead in agentic coding through the middle of the year.
New Models
Flux 3
Flux 3 becomes the new gold standard for image generation
Flux 3 dropped in August and immediately became the gold standard for image generation, landing three years almost to the day after Stable Diffusion first went public. Wolfram used it as the yardstick for how far image AI traveled in those three years.
New ModelsOpen weights
Smart Turn Detection
Daily ships smart turn detection for voice agents
Kwindla's Daily.co shipped smart turn detection during Q2, an open model that helps voice agents know when a speaker has actually finished talking. It landed in the quarter when voice agents first got attention outside the builder bubble.
New ModelsOpen weights
DeepSeek R1
DeepSeek R1: the open reasoning model that crashed NVIDIA's stock
DeepSeek's open-weights reasoning model dropped January 23rd and matched OpenAI's o1 at roughly 50x cheaper pricing, with an alleged training cost of just $5.5M. It crashed NVIDIA stock 17% — a $560B single-day loss, the largest single-company monetary loss in history — and made Chinese AI a household topic. The crew named it the earthquake that shattered assumptions about who leads AI.
$560B NVIDIA stock loss$5.5M DeepSeek R1 training cost
New ModelsOpen weights
DeepSeek V3.1 Terminus
DeepSeek V3.1 Terminus lands amid September's relentless pace
DeepSeek resurfaced in September with V3.1 Terminus, another strong open-weights release that arrived just as the crew was barely keeping up with the weekly firehose. Nisten noted that missing a single week in this period left you completely lost.
New Models
Gemini 2.5
Gemini 2.5 takes the #1 benchmark spot in March
Gemini 2.5 briefly claimed the top benchmark position in March, the moment Wolfram identified as the pivotal point where OpenAI stopped being the undisputed leader. It foreshadowed Google's full comeback later in the year.
New Models
Gemini TTS
Google ships a Gemini TTS model in its December run
As part of Google's December release wave, a Gemini TTS model shipped alongside realtime model updates. It rounded out Google's full-stack voice story heading into 2026.
New Models
VEO3
VEO3: native audio video generation crosses the uncanny valley
Google's VEO3 stunned everyone in Q2 with video generation that included native audio, which the crew credits with crossing the uncanny valley for AI video. It was a centerpiece of Google IO 2025 and of Google's comeback year.
New ModelsOpen weights
Kokoro TTS
Kokoro TTS: 82M-param Apache 2 model hits #1 on TTS Arena
Kokoro, a tiny 82M parameter text-to-speech model, went viral in January after hitting #1 on TTS Arena. Released under Apache 2.0 and small enough to run in the browser, it showed that high-quality speech synthesis no longer required huge models.
New Models
Hailuo 2.3
MiniMax drops Hailuo 2.3 in November
MiniMax released Hailuo 2.3 (referred to as 'Hailuo LLM 2.3' on the show) in November, cited as another strong release from the Chinese labs. It closed out a year in which MiniMax shipped everything from 4M-context LLMs to media models.
New ModelsOpen weights
MiniMax-01
MiniMax-01: open model with a 4M token context window
MiniMax (Hailuo) released MiniMax-01 in January with a 4 million token context window, by far the largest context of any open-weights model at the time. It was an early sign of the Chinese-lab open source dominance that defined 2025.
New ModelsOpen weights
Kimi K2
Kimi K2: the Chinese open model that earned mainstream respect
Moonshot AI's Kimi K2 dropped in July and earned serious mainstream recognition, marking peak Chinese-lab dominance of open source. It was named in the show's TL;DR as one of the defining open-weights releases of 2025.
New Models
GPT-5 Codex
GPT-5 Codex: OpenAI's specialized coding model moves the stock
GPT-5 Codex dropped in September as OpenAI's coding-specialized fine-tune of GPT-5. Yam dubbed it the 'infinite money glitch' because the release moved OpenAI-linked stock prices significantly.
New Models
Sora 2
Sora 2 democratizes video generation and floods the internet with memes
Sora 2 opened Q4 in October by democratizing video generation, complete with a social platform, and spawned a wave of memes still circulating at year's end. The show's TL;DR credits it as part of 2025 crossing the uncanny valley for AI media.
New Models
New voice models (GPT Realtime derivatives)
OpenAI ships two new voice models derived from GPT Realtime
In March, OpenAI released two voice models derived from its GPT Realtime speech-to-speech stack. They were part of a wave that pushed voice agents toward the mainstream over the course of 2025.
New ModelsOpen weights
Hunyuan open weights
Tencent enters the open weights race
In July, Tencent's Hunyuan team (rendered as 'HO One' in the episode) joined Huawei in entering the open-weights model race. It widened the field of Chinese labs shipping serious open models beyond DeepSeek, Qwen, and Moonshot.
New ModelsOpen weights
GLM 4.5
GLM 4.5 runs on Cerebras fast enough to win hackathons
Zhipu's GLM 4.5 came out in July and was the first open model that ran on Cerebras hardware fast enough that hackathon competitors were winning with it. It set up GLM's quiet rise as a business workhorse later in the year.
New ModelsOpen weights
GLM 4.6
GLM 4.6 quietly becomes the model businesses actually use
Zhipu's GLM 4.6 arrived in October and, per Nisten, quietly became a go-to model that many businesses still run today. It continued GLM's trajectory from hackathon favorite to production workhorse.
New ModelsOpen weights
BOLMO
Allen AI's BOLMO reaches byte-level parity with tokenized models
Allen AI released BOLMO, described as the first byte-level language model to reach parity with regular tokenization-based models. The panel framed it as a research breakthrough that could eventually remove tokenizers from the LLM stack.
New ModelsOpen weights
OLMO 2 (multimodal)
Allen AI adds video-input multimodal OLMO models in 4B/7B/8B sizes
Allen AI extended its OLMO family with multimodal models that accept video input, released in 4B, 7B, and 8B sizes. It continues Allen AI's fully open approach to model development alongside the BOLMO byte-level work.
New ModelsOpen weights
FunctionGemma
FunctionGemma: Google's 270M function-calling model for edge agents
Google released FunctionGemma, a tiny 270M-parameter open model specialized for function calling on-device. With a roughly 500MB RAM footprint and strong gains after fine-tuning for mobile actions, it points toward privacy-first local agents on constrained hardware.
New Models
Gemini 3 Flash
Gemini 3 Flash delivers frontier intelligence at $0.50/1M input tokens
Google launched Gemini 3 Flash, offering frontier-tier capability at flash-tier pricing of $0.50 per million input tokens. It scores 78% on SWE-bench Verified, beating larger models on some agentic tasks, and supports tool-calling at scale with up to 100 simultaneous function calls.
$0.50 per 1M Gemini 3 Flash input tokens78% SWE-bench Verified
New ModelsOpen weights
SAM Audio
Meta SAM Audio brings promptable source separation to audio
Meta released SAM Audio, an audio source separation model that extends the Segment Anything concept to sound. It supports multimodal prompting via text, visual, and temporal cues to isolate sources from audio, with weights on Hugging Face and code on GitHub.
New Models
Mistral OCR 3
Mistral OCR 3 claims 74% win-rate over OCR v2 with aggressive pricing
Mistral released OCR 3, its latest document intelligence model, claiming a 74% win-rate over OCR v2. The panel highlighted its aggressive pricing and document performance gains as part of the open-source-adjacent European push on practical document AI.
New ModelsOpen weights
Nemotron 3 Nano
NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes
NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.
30B (3B active) Nemotron 3 Nano parameters
New Models
GPT 5.2 Codex
GPT 5.2 Codex drops live during the show with 400K context
OpenAI released GPT 5.2 Codex via API after months of exclusivity in the Codex app, making it available in Cursor, GitHub Copilot, and VS Code with native context compaction for long sessions. Cursor showcased it by building a complete browser from scratch in Rust, roughly 3 million lines of code across about 330,000 commits, driven by hundreds of concurrent agents.
56.4% SWE-Bench Pro64% Terminal-Bench 2.0
New Models
GPT Image 1.5
OpenAI GPT Image 1.5: 4x faster, 20% cheaper, #1 on LMSYS Image Arena
OpenAI released GPT Image 1.5, an upgraded image generation model that is 4x faster and 20% cheaper than its predecessor. It debuted at #1 on the LMSYS Image Arena leaderboard, part of OpenAI's rapid-fire release week.
New ModelsOpen weights
Chatterbox Turbo
Resemble AI open-sources Chatterbox Turbo, a 350M MIT-licensed TTS
Resemble AI released Chatterbox Turbo, an MIT-licensed 350M-parameter open text-to-speech model. The company claims it beats ElevenLabs in blind listening tests, pushing high-quality TTS into fully open, accessible territory.
New Models
Amazon Nova 2
Amazon announces Nova 2 family: Lite, Pro, Sonic, and Omni
Amazon rolled out the Nova 2 model suite spanning text, speech, and multimodal stacks with Lite, Pro, Sonic, and Omni variants. The launch came with major benchmark jumps over the first Nova generation and includes a fast, cost-effective reasoning model in Nova 2 Lite.
New ModelsOpen weights
Arcee Trinity
Arcee Trinity launches US-trained open MoE family
Arcee AI introduced Trinity, a family of US-trained open mixture-of-experts models built from scratch, starting with Trinity-Mini and Trinity-Nano-Preview. CTO Lukas Atkins joined the show to discuss the training approach and previewed Trinity-Large for January 2026. The release positions Arcee as a domestic alternative in an open-weights field dominated by Chinese labs.
New Models
SeeDream 4.5
SeeDream 4.5 adds multi-reference fusion and stronger text rendering
ByteDance's SeeDream 4.5 image model shipped with emphasis on multi-reference fusion and improved text rendering, an area the panel noted remains a key differentiator among image generators.
New ModelsOpen weights
DeepSeek V3.2 / V3.2-Speciale
DeepSeek V3.2 and V3.2-Speciale post gold-medal reasoning under MIT license
DeepSeek released V3.2 and the reasoning-first V3.2-Speciale, a 685B-parameter MoE under MIT license. Speciale posted gold-medal-level olympiad results and 96% on AIME (versus GPT-5 High at 94%), with V3.2 hitting 73.1% on SWE-Bench Verified. Aggressive pricing around 28 cents per 1M tokens on OpenRouter pushes open models closer to top closed-model capability.
96% AIME73.1% SWE-Bench Verified685B Total parameters (MoE)
New Models
Kling O1 Image
Kling O1 Image expands Kling into image generation
Alongside its video update, Kling shipped O1 Image, expanding the company's generation stack into still images. The release rounds out Kling's multimodal offering beyond its core video models.
New Models
Kling VIDEO 2.6
Kling VIDEO 2.6 adds first native audio generation
Kling released VIDEO 2.6, its first video model with native audio generation, producing sound directly alongside generated footage. It was one of two Kling releases this week spanning video and image generation.
New ModelsOpen weights
VibeVoice-Realtime-0.5B
Microsoft shares VibeVoice-Realtime-0.5B with ~300ms latency TTS
Microsoft published VibeVoice-Realtime-0.5B on Hugging Face, a small realtime text-to-speech model claiming roughly 300ms latency. The show framed it as more evidence that sub-second audio response is becoming table stakes for production voice agents.
~300ms Claimed TTS latency0.5B Parameters
New ModelsOpen weights
Mistral 3 (Large 3 + Ministral 3)
Mistral returns to Apache 2.0 with Mistral Large 3 and Ministral 3
Mistral relaunched its model family under permissive Apache 2.0 licensing with Mistral Large 3 and the small Ministral 3 edge models. Large 3 ships a 256K context window and strong open-model coding positioning. The licensing shift reignited discussion around open model portability and deployability.
256K Mistral Large 3 context window
New ModelsOpen weights
Hermes 4.3
Nous Research ships Hermes 4.3 36B with decentralized training
Nous Research released Hermes 4.3-36B, highlighted on the show for being trained with decentralized infrastructure and for state-of-the-art RefusalBench performance. The release continues the Hermes line of open, steerable instruction-tuned models.
New Models
P-Image
Pruna P-Image promises sub-second image generation at $0.005
Pruna AI promoted P-Image, an image generation offering with sub-second generation times at roughly $0.005 per image. The release fit the week's diffusion theme of competing on speed and cost efficiency rather than just quality.
$0.005 Per image
New Models
Runway Gen-4.5
Runway Gen-4.5 takes #1 on the text-to-video leaderboard
Runway's Gen-4.5 video model climbed to the top of the text-to-video leaderboard with a 1,247 Elo rating. The result continued the weekly theme of video generation quality and multimodal consistency improving fast.
1,247 Text-to-video leaderboard Elo