Episode Summary
NVIDIA dominated CES 2026 with the Vera Rubin platform — delivering 5x inference over Blackwell and 75% fewer GPUs for trillion-parameter training — while XAI raised $20B at a $230B valuation amid Grok's bikini-gate scandal. Ryan Carson broke down the Ralph Wiggum autonomous coding technique (1.2M views on X) that lets agents ship features while you sleep, marking the death of "vibe coding." The panel also covered Upstage's Solar Open 100B, Liquid AI's on-device LFM 2.5, NVIDIA's Nemotron Speech ASR with 24ms latency (demoed by Kwindla Hultman Kramer of Daily.co), and OpenAI's GPT Health launch alongside the first US pilot for AI-prescribed medication.
In This Episode
- 📰 TL;DR - This Week's AI News Rundown
- 🔓 Open Source: Solar Open 100B
- 🔓 Miro Thinker 1.5
- 🔓 Liquid AI LFM 2.5
- 🔓 Zhipu AI IPO & NousCoder
- 🏢 NVIDIA CES & Vera Rubin Platform
- 💰 NVIDIA Groq Acquisition
- 🔊 Nemotron Speech ASR
- 🤖 Alpha Mayo Self-Driving
- 🏢 Grok & XAI: $20B Raise Amid Bikini-Gate
- 🛠️ Alexa Plus on the Web
- 🏢 GPT Health & AI Medicine
- 🤖 Ralph Wiggum: The Autonomous Coding Loop
- 📰 Wrap Up & Goodbye
Hosts & Guests
By The Numbers
🔥 Breaking During The Show
📰 TL;DR - This Week's AI News Rundown
Alex runs through the week's biggest stories: NVIDIA's Vera Rubin at CES delivering 5x over Blackwell, XAI's $20B raise amid Grok controversy, Solar Open 100B and other open source releases, OpenAI's GPT Health waitlist, Google bringing Gmail into the Gemini era, and the first US pilot for AI-prescribed medication renewals.
- NVIDIA Vera Rubin: 5x inference over Blackwell at CES 2026
- XAI raises $20B at $230B valuation
- Google Gmail enters the Gemini era for 3B users (breaking news)
- Doctronic: first US pilot for AI prescription renewals
🔓 Open Source: Solar Open 100B
Upstage releases Solar Open 100B, a 102B parameter MoE model with only 12B active parameters, trained on 19.7 trillion tokens with an innovative data factory approach. LDJ highlights the SNAP PO reinforcement learning technique with a 50% training speedup, and the panel discusses how this model outperforms GLM 4.5 Air on many benchmarks with strong Korean language optimization.
- 102B params, 12B active, 129 experts with top-8 activation
- 19.7T training tokens with 4.5T synthetic data
- SNAP PO: 50% RL training speedup
- Best-in-class Korean language performance
🔓 Miro Thinker 1.5
MiroMind AI releases Miro Thinker 1.5, a 30B parameter open source search agent achieving 56.1% on BrowserComp — outperforming trillion-parameter models through 'interactive scaling.' The panel debates the growing importance of agent harnesses in 2026, with Ryan noting that domain-specific harnesses are the bleeding edge and Nisten emphasizing how hard they are to build well.
- 30B model beating trillion-parameter models on search benchmarks
- Interactive scaling: third dimension of scaling beyond params and context
- 56.1% BrowserComp, 66.8% BrowserComp Chinese
- Fine-tune of Qwen 3 Thinking with 147K open training samples
🔓 Liquid AI LFM 2.5
Liquid AI releases LFM 2.5, a family of tiny ~1B parameter on-device models with text, vision, and audio support, announced at CES alongside AMD's Lisa Su. The models achieve 239 tokens/sec on AMD CPU and 100 tokens/sec on iPhone 16 Pro Max. LDJ highlights the revolutionary end-to-end audio model that skips the traditional ASR-LLM-TTS pipeline entirely.
- 1.2B params running at 239 tps on AMD CPU, 100 tps on iPhone
- End-to-end audio model: no separate ASR or TTS needed
- 14% on IF-Eval 2025 — impressive for a 1B model
- Announced with AMD on stage at CES
🔓 Zhipu AI IPO & NousCoder
Zhipu AI (makers of GLM) becomes the world's first major LLM company to IPO on the Hong Kong Stock Exchange, raising $558M. Nous Research releases NousCoder 14B, an open source competitive programming model that achieved a 7% jump on LiveCodeBench accuracy in just four days of RL training on 48 NVIDIA B200 GPUs.
- Zhipu AI IPO: $558M raised, first major LLM company to go public
- NousCoder 14B: 7% LiveCodeBench jump in 4 days of RL
- 24,000 verifiable problems used for RL training
- Full Apache 2 license with training code and benchmark harness
🏢 NVIDIA CES & Vera Rubin Platform
Jensen Huang unveils the Vera Rubin platform at CES 2026 — NVIDIA's next-gen AI computer delivering 5x inference over Blackwell with only marginally more power draw. LDJ walks through the specs: over 3x the PFLOPS of Blackwell at 1800W, 13 TB/s bandwidth, and 75% fewer GPUs needed for 10T parameter MoE training. Ryan calls it truly astonishing and Nisten marvels at the power efficiency.
- Vera Rubin: 50 PFLOPS inference, 5x over Blackwell
- 3x+ PFLOPS gain while only adding ~200W power
- 75% fewer GPUs for 10T parameter MoE training
- 72 GPUs per rack, 20.7 TB memory, 100% liquid cooled
- Announced in full production just 4 months after B300
💰 NVIDIA Groq Acquisition
NVIDIA enters an exclusive licensing deal and acquires most of Groq's team for approximately $20B. Alex explains how Groq's inference-optimized chips, created by former Google TPU lead Jonathan Ross, complement NVIDIA's training dominance — reinforcing the panel's view that there's no AI bubble given insatiable demand for inference.
- NVIDIA acquires Groq team and technology for ~$20B
- Groq founder Jonathan Ross was instrumental in creating Google TPUs
- Inference demand growing exponentially across all AI use cases
🔊 Nemotron Speech ASR
NVIDIA releases Nemotron Speech ASR, a 600M parameter open source streaming speech model with 24ms median latency and support for 900 concurrent streams on a single H100. Alex plays a demo featuring Kwindla Hultman Kramer of Daily.co showing sub-500ms voice-to-voice latency with a three-model pipeline of Nemotron ASR, Nemotron Nano LLM, and Magpie TTS.
- 600M params — runs on a toaster
- 24ms median latency, 900 concurrent streams per H100
- Sub-500ms total voice-to-voice latency
- Demoed by Kwindla Hultman Kramer of Daily.co / PipeCat
🤖 Alpha Mayo Self-Driving
LDJ highlights NVIDIA's Alpha Mayo, a family of open source reasoning-based self-driving AI models announced at CES. The model performs end-to-end autonomous driving with explicit reasoning steps like identifying jaywalkers. Alex jokes about whether you want reasoning in a model that needs to make split-second driving decisions.
- Open source self-driving model with reasoning steps
- End-to-end autonomous drive demo in Mercedes-Benz
- Real-time reasoning: identifies jaywalkers, stops accordingly
🏢 Grok & XAI: $20B Raise Amid Bikini-Gate
XAI raises $20B at a $230B valuation with NVIDIA as a strategic investor, while Grok faces major backlash over its image model's lack of NSFW guardrails. The panel debates the responsibility of AI products vs tools — Nisten notes guardrails are trivially easy to implement, Wolfram argues for going after bad actors not tools, and Alex draws a sharp line between open-source tools and consumer products embedded in social media.
- XAI Series E: $20B raised at $230B valuation
- Grok bikini-gate: no guardrails on image model in replies
- XAI claimed 600M active users by counting all X users
- Panel debates tool vs product responsibility for AI safety
🛠️ Alexa Plus on the Web
Alex demos Alexa Plus, Amazon's smart Alexa experience now available as a web chat interface for $20/month. The upgraded assistant supports free-flowing conversations without repeating the wake word, integrates with smart home devices, and can continue conversations across devices. LDJ notes Amazon's earlier Claude partnership and their own Nova model line.
- Web-based chat interface for Alexa Plus
- Smart home integration with natural language commands
- $20/month, text chat only — voice coming later
- Continue conversations across devices
🏢 GPT Health & AI Medicine
OpenAI launches a GPT Health waitlist for privacy-first health conversations with connected health records and fitness apps. Nisten explains why LLMs are so good at medicine — only ~2,000 diseases and drugs to master. Ryan asks about Epic/MyChart integration, and the panel discusses Doctronic's first US pilot in Utah where AI can autonomously renew prescriptions at just $4 per renewal.
- GPT Health: integrates Apple Health, Function Health, MyFitnessPal, Peloton
- LLMs only need to handle ~2,000 diseases and ~2,000 drugs
- Doctronic: first US AI prescription renewal pilot in Utah
- $4 per renewal, 190 routine medications, excludes controlled substances
🤖 Ralph Wiggum: The Autonomous Coding Loop
Ryan Carson gives a masterclass on Ralph Wiggum, the autonomous coding technique created by Jeff Huntley that hit 1.2M views on X. The method: write a PRD, break it into atomic user stories with acceptance criteria in JSON, then run a bash loop that tells your CLI agent (Amp, Claude Code, etc.) to pick the next incomplete story, code it, commit, update progress, and loop — shipping features while you sleep. Nisten reveals Ralph's origin story from a San Francisco meetup and how it won a YC hackathon.
- Write PRD → atomic user stories in JSON → bash loop agent
- Compound learning: agent writes lessons to agents.md each loop
- Ryan shipped 5 features in 2 days using Ralph
- Won YC hackathon by letting Ralph run overnight on Sonnet 4.5
- Works with any CLI agent: Amp, Claude Code, Cursor CLI, Gemini CLI
📰 Wrap Up & Goodbye
Alex wraps the first show of 2026 with over 1,700 live viewers. The episode spanned NVIDIA CES announcements, Ralph Wiggum autonomous coding, GPT Health and AI medicine, and a strong week of open source releases. Wolfram has officially joined Weights & Biases as an AI evangelist focused on evals, and the team teases agentic skills coverage for next week.
- 1,700+ live viewers for the first show of 2026
- Wolfram Ravenwolf officially joins Weights & Biases
- Agentic skills and MCP coverage teased for next episode
Hosts & Guests
Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-Hosts - @WolframRvnwlf, @nisten, @ldjconfirmed
Special Guest - Ryan Carson (@ryancarson) breaking down the Ralph Wiggum technique.
Open Source LLMs
Solar Open 100B - Upstage’s 102B MoE model. Trained on 19.7T tokens with a heavy focus on “data factory” synthetic data and high-performance Korean reasoning (X, HF, Tech Report).
MiroThinker 1.5 - A 30B parameter search agent that uses “Interactive Scaling” to beat trillion-parameter models on search benchmarks like BrowseComp (X, HF, GitHub).
Liquid AI LFM 2.5 - A family of 1B models designed for edge devices. Features a revolutionary end-to-end audio model that skips the ASR-LLM-TTS pipeline (X, HF).
NousCoder-14B - competitive coding model from Nous Research that saw a 7% LiveCodeBench accuracy jump in just 4 days of RL (X, WandB Dashboard).
Zhipu AI IPO - The makers of GLM became the first major LLM firm to go public on the HKEX, raising $558M (Announcement).
Big Co LLMs & APIs
NVIDIA Vera Rubin - Jensen Huang’s CES reveal of the next-gen platform. Delivers 5x Blackwell inference performance and 75% fewer GPUs needed for MoE training (Blog).
OpenAI ChatGPT Health - A privacy-first vertical for EHR and fitness data integration (Waitlist).
Google Gmail Era - Gemini 3 integration into Gmail for 3 billion users, featuring AI Overviews and natural language inbox search (Blog).
XAI $20B Raise - Elon’s XAI raises Series E at a $230B valuation, even as Grok faces heat over bikini-gate and safety guardrails (CNN Report).
Doctronic - The first US pilot in Utah for autonomous AI prescription renewals without a physician in the loop (Web).
Alexa+ Web - Amazon brings the “Smart Alexa” experience to browser-based chat (Announcement).
Autonomous Coding & Tools
Ralph Wiggum - The agentic loop technique for autonomous coding using small, atomic user stories. Ryan Carson’s breakdown of why this is the death of “vibe coding” (Viral X Article).
Catnip by W&B - Chris Van Pelt’s open-source iOS app to run Claude Code anywhere via GitHub Codespaces (App Store, GitHub).
Vision & Video
LTX-2 - Lightricks open-sources the first truly open audio-video generation model with synchronized output and full training code (GitHub, Replicate Demo).
Avatar Forcing - KAIST’s framework for real-time interactive talking heads with ~500ms latency (Arxiv).
Qwen Edit 2512 - Optimized by PrunaAI to generate high-res realistic images in under 7 seconds (Replicate).
Voice & Audio
Nemotron Speech ASR - NVIDIA’s 600M parameter streaming model with sub-100ms stable latency for massive-scale voice agents (HF).