ThursdAI · December 18, 2025

📆 ThursdAI - Dec 18 - Gemini 3 Flash, Grok Voice, ChatGPT Appstore, Image 1.5 & GPT 5.2 Codex, Meta Sam Audio & more AI news

From Weight & Biases, first part of our end of the year coverage, focuses on this weeks releases, Gemini 3 Flash, Grok Voice SDK, new ChatGPT Image and GPT 5.2 Codex and much more!

By Alex Volkov

39 min

YouTube Spotify Apple Podcasts Substack

Episode Summary

ThursdAI’s pre-holiday episode is a speedrun through one of the most stacked AI news weeks of 2025. The crew breaks down Google’s Gemini 3 Flash leap in price-to-performance, OpenAI’s rapid-fire releases (GPT Image 1.5, ChatGPT App Store, and breaking GPT 5.2 Codex), and NVIDIA’s major open-source Nemotron 3 Nano drop. They also cover the rapidly commoditizing voice stack with xAI Grok Voice, Chatterbox Turbo, and Meta’s SAM Audio. With Kwindla joining the panel, the conversation stays practical on what actually matters for builders shipping agents right now.

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Kwindla Hultman Kramer

Co-Founder & CEO · Daily.co

@kwindla

Wolfram Ravenwolf

Weekly co-host, AI model evaluator

@WolframRvnwlf

Yam Peleg

AI builder & founder

@Yampeleg

Nisten Tahiraj

AI operator & builder

@nisten

LDJ

Weekly co-host of ThursdAI

@ldjconfirmed

Ryan Carson

AI educator & founder

@ryancarson

By The Numbers

per 1M Gemini 3 Flash input tokens

$0.50

Google’s frontier-tier model pricing that resets the cost/performance baseline

SWE-bench Verified

78%

Gemini 3 Flash coding benchmark score highlighted as beating larger models in some agentic tasks

SWE-Bench Pro

56.4%

GPT 5.2 Codex benchmark on specialized coding evaluation

Terminal-Bench 2.0

64%

GPT 5.2 Codex terminal workflow benchmark

Grok Voice Agent API

$0.05/min

Flat-rate voice API pricing from xAI

Nemotron 3 Nano

30B (3B active)

NVIDIA hybrid Mamba-MoE architecture emphasizing efficient active parameters

🔥 Breaking During The Show

OpenAI drops GPT 5.2 Codex during ThursdAI

Near the end of the episode, OpenAI released GPT 5.2 Codex live during the recording, prompting an immediate benchmark and capability discussion by the panel.

🔓 Open Source LLMs

The panel highlights NVIDIA Nemotron 3 Nano as the most consequential open release of the week, not only for performance but for releasing full training data and recipes. They also cover Allen AI’s BOLMO and OLMO multimodal progress plus Mistral OCR 3’s aggressive pricing and document performance gains.

NVIDIA Nemotron 3 Nano: 30B params, 3B active, hybrid Mamba-MoE
NVIDIA released weights, reports, recipes, and 25T-token data details
BOLMO: byte-level parity breakthrough from Allen AI
OLMO multimodal video models (4B/7B/8B)
Mistral OCR 3 claims 74% win-rate over OCR v2

🏢 Big CO LLMs + APIs

Google and OpenAI trade major launches in the same week. Gemini 3 Flash stands out for frontier capability at flash-tier price, while OpenAI pushes GPT Image 1.5 and then drops GPT 5.2 Codex during the show as breaking news.

Gemini 3 Flash: 78% SWE-bench Verified at flash pricing
Google tool-calling scale: up to 100 simultaneous function calls
OpenAI GPT Image 1.5: 4x faster, 20% cheaper
GPT 5.2 Codex: 400K context with context compaction
ChatGPT App Store submissions opened via MCP app model

⚡ This Week’s Buzz

A community moment: Alex announces Wolfram joining Weights & Biases/CoreWeave as an AI Evangelist and ‘AIvaluator.’ The segment frames 2026 as a more benchmark-driven era for the show and the broader AI community.

Wolfram Ravenwolf announced as joining W&B/CoreWeave
Focus on deeper public evals and model benchmarking
Weave highlighted for practical AI evaluations

🔊 Voice & Audio

Voice AI competition tightens with lower prices and more capable real-time stacks. xAI ships Grok Voice Agent API with Tesla integration and strong audio benchmark positioning, while open-source Chatterbox Turbo and Meta SAM Audio push accessible audio generation and separation.

Grok Voice Agent API: $0.05/min pricing and Tesla integration
Big Bench Audio leadership claim for Grok voice stack
Resemble Chatterbox Turbo: MIT-licensed 350M open TTS
Meta SAM Audio: source separation with multimodal prompting

🛠️ FunctionGemma & Edge Agents

Google’s tiny FunctionGemma release gets a dedicated discussion for what it signals about on-device agents. The model is small enough for constrained hardware and points toward privacy-first local function-calling assistants.

FunctionGemma at 270M parameters
~500MB RAM footprint for edge usage
Strong improvement after fine-tuning for mobile actions
On-device tool use for private assistant workflows

📰 Year-End Coverage Preview

The episode closes by setting up the full 2025 recap planned for the next show. This installment is framed as week-of-news triage, with the year-in-review coming as a separate deep retrospective.

Dec 18 episode intentionally focused on weekly drops
Full 2025 month-by-month recap queued for next week
Team emphasizes pace and acceleration of releases

TL;DR and Show Notes

Hosts and Guests

Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-hosts: @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed, @ryancarson
Special Guest: @kwindla - CEO of Daily

Open Source LLMs

NVIDIA Nemotron 3 Nano - 30B-3A hybrid Mamba-MoE model (X, HF, HF FP8)
FunctionGemma - 270M parameter function calling model (X, Blog, Docs)
Mistral OCR 3 - Document intelligence model with 74% win rate over v2 (X, Blog, Console)
BOLMO from Allen AI - First byte-level model reaching parity with regular tokenization (X)
OLMO 2 from Allen AI - Multimodal with video input (4B, 7B, 8B sizes) (X)

Big CO LLMs + APIs

Google Gemini 3 Flash - Frontier intelligence at $0.50/1M input tokens, 78% SWE-bench Verified (X, Announcement)
OpenAI GPT Image 1.5 - 4x faster, 20% cheaper, #1 on LMSYS Image Arena (X)
OpenAI GPT 5.2 Codex - 56.4% SWE-Bench Pro, 64% Terminal-Bench 2.0, 400K context (X, Blog)
ChatGPT App Store - MCP-powered apps submission now open (X)

This Week’s Buzz

🐝 Wolfram joins Weights & Biases / CoreWeave as AI Evangelist and AIvaluator!
Try Weave for AI evaluations

Voice & Audio

xAI Grok Voice Agent API - #1 Big Bench Audio (92.3%), $0.05/min flat rate, powers Tesla vehicles (X)
Resemble AI Chatterbox Turbo - MIT-licensed 350M TTS, beats ElevenLabs in blind tests (X, HF, GitHub, Blog)
Meta SAM Audio - Audio source separation with text/visual/temporal prompts (X, HF, GitHub)

Show Links

Full 2025 Yearly Recap - Coming next week!

ThursdAI · December 18

0:00 0:00