ThursdAI · December 4, 2025

📆 ThursdAI - Dec 4, 2025 - DeepSeek V3.2 Goes Gold Medal, Mistral Returns to Apache 2.0, OpenAI Hits Code Red, and US-Trained MOEs Are Back!

From Weights & Biases: Deepseek is back winning Gold, Mistral is back with Apache 2 licensed 3-large and Minis, Chat with Lucas from Arcee and how to get free Opus 4.5 coding? First December show

By Alex Volkov

94 min

YouTube Spotify Apple Podcasts Substack

Episode Summary

ThursdAI’s first December episode was a full firehose: DeepSeek V3.2 dropped with gold-medal-level reasoning results, Mistral returned to Apache 2.0 with new large and edge models, and Arcee joined to talk about building US-trained MOEs from scratch. The panel unpacked what these releases mean for open-source momentum, inference cost, and real enterprise adoption constraints. On the closed-model side, OpenAI reportedly hit a “code red” response to Gemini 3 pressure while Amazon rolled out Nova 2 across text, speech, and multimodal stacks. The show closed with rapid updates across eval tooling, video generation, realtime voice, and low-cost image diffusion.

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

Arcee AI — CTO

Weekly co-host, AI model evaluator

AI builder & founder

AI operator & builder

@nisten

LDJ

Weekly co-host of ThursdAI

@ldjconfirmed

By The Numbers

AIME

96%

DeepSeek V3.2-Speciale reported AIME score versus GPT-5 High at 94%

SWE-Bench Verified

73.1%

DeepSeek V3.2 agentic coding benchmark result

DeepSeek V3.2-Speciale

685B

Total parameters in the MIT-licensed MoE model

Per 1M tokens

28¢

Approximate OpenRouter pricing cited for DeepSeek V3.2

Mistral Large 3 context

256K

Quarter-million token context window for Mistral Large 3

ARC-AGI-2

45.1%

Gemini 3 Deep Think benchmark score discussed in the episode

🔥 Breaking During The Show

DeepSeek V3.2-Speciale posts gold-level olympiad results

DeepSeek’s latest reasoning-first release landed with standout olympiad and coding numbers plus aggressive pricing, pushing open models closer to top closed-model capability.

Mistral returns to Apache 2.0 with Mistral 3 family

Mistral relaunched large and small multimodal models under permissive licensing, reigniting discussion around open model portability and deployability.

OpenAI Code Red and Gemini pressure

The episode covered reports that OpenAI shifted priorities in response to Gemini momentum while the broader API race accelerated across Google, Amazon, and Cursor integrations.

🔓 Open Source LLMs

The panel went deep on DeepSeek V3.2, Mistral 3, Arcee Trinity, and Hermes 4.3 as proof that open models are moving fast on both reasoning and coding utility. They discussed benchmark context, licensing shifts back to Apache 2.0, and why MoE architecture plus efficient post-training is changing the economics of open AI.

DeepSeek V3.2-Speciale posted gold-level olympiad and AIME results with MIT license
Mistral Large 3 and Ministral 3 relaunched under Apache 2.0 with strong open-model coding positioning
Arcee Trinity introduced US-trained open MoEs and previewed Trinity-Large for January 2026
Hermes 4.3 highlighted decentralized training and RefusalBench performance

🏢 Big CO LLMs + APIs

Coverage shifted to the frontier API race: OpenAI’s reported internal “code red,” Amazon’s Nova 2 suite, Gemini 3 Deep Think, and Cursor’s temporary free access to GPT-5.1-Codex-Max. The discussion emphasized that product integration and latency matter as much as raw benchmark IQ.

OpenAI reportedly paused side projects to focus on intelligence and speed
Amazon Nova 2 announced Lite, Pro, Sonic, and Omni with major benchmark jumps
Gemini 3 Deep Think introduced high-cost parallel reasoning with ARC-AGI-2 gains
Cursor offered GPT-5.1-Codex-Max free access through Dec 11

⚡ This Week’s Buzz

Weights & Biases launched LLM Evaluation Jobs to run evaluations against OpenAI-compatible APIs during training cycles, not just at the end. The segment framed this as a practical workflow upgrade for teams trying to move faster without blindly burning compute.

W&B launched LLM Evaluation Jobs
Supports evaluating OpenAI-compatible endpoints
Focus on earlier model quality signals during development

🎥 Vision & Video

Video model updates included Runway Gen-4.5 leaderboard gains and two Kling releases spanning native audio video and image generation. The updates continued the theme that video quality and multimodal consistency are improving week-over-week.

Runway Gen-4.5 reached top text-to-video leaderboard position
Kling VIDEO 2.6 introduced native audio generation
Kling O1 Image expanded image generation capabilities

🔊 Voice & Audio

The show highlighted Microsoft VibeVoice-Realtime-0.5B and its low-latency realtime TTS profile. The segment focused on how sub-second audio response is becoming table stakes for production voice agents.

Microsoft VibeVoice-Realtime-0.5B shared with ~300ms latency claims
Voice model availability on Hugging Face
Realtime speech UX increasingly central to agent products

🎨 AI Art & Diffusion

Image-generation updates centered on speed and cost efficiency, with Pruna P-Image claiming sub-second generation at very low per-image pricing and SeeDream 4.5 adding stronger text rendering and multi-reference fusion.

Pruna P-Image promoted sub-second image generation at low cost
SeeDream 4.5 emphasized multi-reference fusion
Text rendering quality remained a key differentiator

TL;DR and Show Notes

Hosts and Guests

Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed
Guest - Lucas Atkins (@latkins) - CTO Arcee AI

Open Source LLMs

DeepSeek V3.2 and V3.2-Speciale - Gold medal olympiad wins, MIT license (X, HF V3.2, HF Speciale, Announcement)
Mistral 3 family - Large 3 and Ministral 3, Apache 2.0 (X, Blog, HF Large, HF Ministral)
Arcee Trinity - US-trained MOE family (X, HF Mini, HF Nano, Blog)
Hermes 4.3 - Decentralized training, SOTA RefusalBench (X, HF)

Big CO LLMs + APIs

OpenAI Code Red - ChatGPT 3rd birthday, Garlic model in development (The Information)
Amazon Nova 2 - Lite, Pro, Sonic, and Omni models (X, Blog)
Gemini 3 Deep Think - 45.1% ARC-AGI-2 (X, Blog)
Cursor + GPT-5.1-Codex-Max - Free until Dec 11 (X, Blog)

This Week’s Buzz

WandB LLM Evaluation Jobs - Evaluate any OpenAI-compatible API (X, Announcement)

Vision & Video

Runway Gen-4.5 - #1 on text-to-video leaderboard, 1,247 Elo (X)
Kling VIDEO 2.6 - First native audio generation (X)
Kling O1 Image - Image generation (X)

Voice & Audio

Microsoft VibeVoice-Realtime-0.5B - 300ms latency TTS (X, HF)

AI Art & Diffusion

Pruna P-Image - Sub-second generation at $0.005 (X, Blog, Demo)
SeeDream 4.5 - Multi-reference fusion, text rendering (X)

ThursdAI · December 4

0:00 0:00