Episode Summary
This is one of those wonderfully overloaded ThursdAI episodes where the news week refuses to fit inside the runtime. NVIDIA drops Nemotron 3 Ultra live with Chris Alexiuk on hand to explain the 550B open model, Arena launches Agent Arena with Peter Gostev, and Karan joins to talk Hermes Agent turning into a real community tool. Around that spine, Alex and the crew hit Microsoft MAI, Gemma 4, MiniMax M3, Ideogram 4, Reve V2, RTX Spark laptops, ElevenLabs dubbing, Cartesia audio, and Wolframβs token-usage view for WolfBench.
In This Episode
- π° Show Open & Welcome
- β‘ ThursdAI TL;DR - Jun 4, 2026
- π Open Source AI
- π Google Gemma 4 12B (Encoder-Free Multimodal)
- π οΈ JetBrains Mellum 2
- π MiniMax M3
- π€ Agent Arena from LMArena
- π’ Microsoft MAI Thinking & Code Models
- π¨ Microsoft MAI Image 2.5
- π¨ Ideogram 4 (Open Weights)
- π¨ Reve V2 (Layout-Based Image Model)
- π Interview: Chris Alexiuk (NVIDIA) - Nemotron 3 Ultra
- π NVIDIA Nemotron 3.5 ASR
- π» NVIDIA RTX Spark & Computex Laptops
- π€ Interview: Karan (Nous Research) - Hermes Agent
- π οΈ Hermes Harness Engineering & Security
- π₯οΈ Hermes Desktop
- π Voice & Audio - ElevenLabs Dubbing V2
- π Cartesia Ink2 Streaming ASR Demo
- π§ͺ WolfBench - Token Usage Visualization
- π° Show Wrap-up
Hosts & Guests
By The Numbers
π₯ Breaking During The Show
π° Show Open & Welcome
Alex opens a packed June 4 show with Nemotron 3 Ultra breaking from NVIDIA, a fresh wave of image models, Microsoft declaring a serious frontier-model push, and the usual promise to compress an unreasonable news week into one live show. Chris Alexiuk, Karan, and Peter Gostev are set up as the main guest voices for the episode.
- NVIDIA Nemotron 3 Ultra dropped the same day as the show
- Microsoft MAI, open image models, and agent benchmarks set the agenda
- Chris Alexiuk, Karan, and Peter Gostev join as guests
β‘ ThursdAI TL;DR - Jun 4, 2026
The fast run-through frames the week: NVIDIA RTX Spark, Microsoft MAI models, MiniMax M3, Gemma 4, Agent Arena, image-model leaderboard chaos, ElevenLabs dubbing, and CoreWeave/W&B hackathon notes. It is the table of contents for a show that keeps getting interrupted by real launches.
- Chris Alexiuk joins for Nemotron 3 Ultra
- Karan joins for Hermes Agent and Nous Research
- Peter Gostev joins to explain Agent Arena
π Open Source AI
The open-source block starts with the usual ThursdAI bias toward models people can inspect, run, and build on. Alex sets aside NVIDIA for the later Chris interview and opens with smaller but important releases from Google, JetBrains, and MiniMax.
- Open-source model coverage split into multiple segments
- NVIDIA saved for a deeper guest interview
- Gemma, Mellum, and MiniMax lead the first pass
π Google Gemma 4 12B (Encoder-Free Multimodal)
Gemma 4 12B gets the first technical dive because its encoder-free multimodal design matters: instead of bolting a separate vision/audio encoder onto a language model, Google is pushing toward one unified network. LDJ and Yam explain why this can make smaller multimodal models cheaper, cleaner, and easier to run locally.
- 12B parameter encoder-free multimodal model
- Apache 2.0 license and 16GB VRAM target
- LDJ explains why unified multimodal training matters
π οΈ JetBrains Mellum 2
JetBrains ships Mellum 2, a 12B mixture-of-experts coding model with only 2.5B active parameters. The panel treats it as another sign that IDE companies are trying to turn years of developer workflow context into model advantage.
- 12B MoE coding model with 2.5B active parameters
- Trained with a three-stage curriculum over 10T tokens
- Available on CoreWeave Inference
π MiniMax M3
MiniMax M3 brings a one-billion-token sparse attention context claim and strong coding/agentic benchmark numbers, but the panel keeps the hype measured because weights and licensing details still matter. The practical thread is that MiniMax models already have a following for cheap agentic tool calling even when pure coding quality is debated.
- Open-weights frontier coding model announcement
- One-billion-token sparse attention context claim
- Reported 59 on SWE-bench Pro and 66 on an internal benchmark
π€ Agent Arena from LMArena
Agent Arena lands live enough to get the breaking-news treatment. Peter Gostev explains why chatbot A/B preference battles are no longer enough and how Arena is moving toward real agent workflows with web search, files, terminals, user corrections, and objective recovery signals.
- Arena launches real-world agentic evals at scale
- Models are judged on longer workflows, not one-turn chat only
- Peter explains the move from battle mode to agent mode
π’ Microsoft MAI Thinking & Code Models
Microsoft uses Build 2026 to show seven MAI models across thinking, code, image, transcription, and voice. The panel focuses on MAI Thinking 1 and MAI Code 1 Flash as signs that Microsoft AI is becoming a model lab in its own right rather than only an OpenAI distribution channel.
- Seven Microsoft AI models announced at Build 2026
- MAI Thinking 1 is a 1T total, 35B active MoE
- MAI Code 1 Flash ships into GitHub Copilot
π¨ Microsoft MAI Image 2.5
MAI Image 2.5 gets attention because it jumps high on Arena image leaderboards surprisingly quickly. Alex and Peter discuss its strengths in editing, cleanup, diagrams, and documents, while also testing the public playground path for people who want to try it outside heavier Microsoft developer surfaces.
- Number two on Arena image-to-image at the time of discussion
- Strong image cleanup, background, document, and diagram results
- Available through playground.microsoft.ai
π¨ Ideogram 4 (Open Weights)
Ideogram 4 is the rare image-model release that is both strong at text/layout and open weights, even if under a non-commercial license. The panel digs into its 9.3B parameter size, design-arena showing, bounding-box prompting, and the tradeoff between precise structured prompting and casual generation.
- 9.3B parameter open-weight text-to-image model
- Strong design and text-rendering results
- Supports bounding-box/layout-style prompting
π¨ Reve V2 (Layout-Based Image Model)
Reve V2 climbs near the top of text-to-image Arena, but Alexβs live testing shows both the promise and weirdness of the model. The interesting part is not perfect portraits; it is the layout engine and editing flow that make precise graphic/image iteration feel different from normal prompt-only generation.
- Reve V2 reaches about 1200 ELO on image Arena
- Alex tests portrait generation and finds inconsistent identity quality
- The layout-first editor is the real differentiator
π Interview: Chris Alexiuk (NVIDIA) - Nemotron 3 Ultra
Chris Alexiuk joins from NVIDIA HQ to unpack Nemotron 3 Ultra, a 550B sparse open model with 55B active parameters designed around agentic harnesses. The conversation covers NVIDIAβs open data, recipes, reward model, GenRM, NVFP4 checkpoint, hybrid Mamba/Transformer architecture, and why speed matters more as agents run longer contexts.
- 550B total parameters with 55B active
- Built for agentic harnesses like OpenCode, Hermes, and OpenClaw
- Open weights, data, recipes, reward model, and training details released
π NVIDIA Nemotron 3.5 ASR
The NVIDIA segment continues into speech with Nemotron 3.5 ASR, a tiny but fast streaming transcription model. Chris credits NVIDIAβs speech research team while Alex highlights the 600M parameter size, 40-language support, and throughput jump that pushes the latency/accuracy frontier.
- 600M parameter streaming ASR model
- Supports 40 languages
- Reported 17x more throughput than Parakeet with half the size
π» NVIDIA RTX Spark & Computex Laptops
The Computex discussion shifts from cloud-scale NVIDIA to local AI PCs. RTX Spark and the new laptop wave put RTX 5070-class GPUs, 128GB memory, and roughly one petaflop of local AI into thin machines, which raises the practical question of what agents should run locally versus remotely.
- RTX Spark brings NVIDIA further into AI PCs
- 128GB memory and roughly 1 petaflop local AI headline the announcement
- Chris notes Nemotron Ultra is too large for local laptops, but smaller models are improving fast
π€ Interview: Karan (Nous Research) - Hermes Agent
Karan joins to talk about Nous Researchβs surreal moment at Computex and Hermes Agentβs unexpected community adoption. He frames Hermes as a tool originally built for RL rollouts that escaped into real user workflows, with the community and the agent itself becoming major contributors to its growth.
- Jensen Huang showed Nous Research on stage at Computex
- Hermes Agent has grown into a widely used open agent harness
- Karan says Hermes was built for RL rollouts before the community ran with it
π οΈ Hermes Harness Engineering & Security
The Hermes discussion turns into a useful taxonomy of harness engineering: prompts, simulated terminals, permissioning, tool environments, and the security boundary around letting agents act. Karan connects todayβs agent harnesses back to WorldSim and the older prompt-engineering lineage that made terminal-style agents possible.
- Harness engineering is treated as a new craft layer above prompt engineering
- WorldSim and simulated-terminal work are framed as precursors
- The panel discusses permissions, security, and local control
π₯οΈ Hermes Desktop
Karan previews Hermes Desktop as a more accessible UI for the same agent power: chat, permissions, tool visibility, admin controls, and local app-style usage. Alex compares the shape of it to Codex-level local harnesses rather than a simple chatbot wrapper.
- Hermes Desktop packages Hermes Agent into a desktop UI
- Admin controls target small teams, startups, and personal agent fleets
- Users can inspect tool calls, reasoning traces, and permissions
π Voice & Audio - ElevenLabs Dubbing V2
The audio section is the live-demo brain-melter: Alex plays ElevenLabs Dubbing V2 translating voices while preserving cadence, expression, intonation, and even stutters. The section includes multilingual demos from Alex, Nisten, and Alexβs daughter Emma, who is only present as a private dubbing example rather than a show participant.
- ElevenLabs Dubbing V2 preserves cadence and expression across languages
- Alex demos Nisten in Hebrew and his own voice in multiple languages
- Emma is a dubbing-demo voice only, not included as a guest
π Cartesia Ink2 Streaming ASR Demo
The show squeezes in Cartesia Ink2 and related audio/transcription notes near the end while Alex starts summarizing the huge episode. It also becomes a bridge into W&B/CoreWeave and WeaveHacks reminders, including hackathon credits and practical builder calls to action.
- Cartesia Ink2 streaming ASR gets a short mention/demo slot
- Alex recaps the major guests and topics before moving to community notes
- WeaveHacks is promoted for San Francisco builders
π§ͺ WolfBench - Token Usage Visualization
Wolfram shows a WolfBench feature that visualizes not just benchmark score, but token usage. The important point is that two models can look close on a leaderboard while one burns dramatically more tokens, which changes the real cost and latency story.
- WolfBench adds a 3D token-usage visualization
- Gemini 3.5 Flash and GPT 5.5 are compared through score plus token depth
- Wolfram argues cost/time calculations need token usage, not only benchmark bars
π° Show Wrap-up
Alex closes a two-and-a-half-hour show that still somehow did not cover everything. The final beat thanks the live audience, points listeners to podcast and YouTube versions, and notes that the rumored OpenAI drop did not arrive, which may have been a mercy given how full the episode already was.
- Show runs more than two and a half hours
- Alex thanks listeners and points to podcast/YouTube versions
- The expected OpenAI update did not land during the show
Show Notes & Guests
Alex Volkov - AI Evangelist & Weights & Biases CoreWeave (@altryne)
Co Hosts - @WolframRvnwlf @yampeleg @ldjconfirmed
Guests: Chris Alexiuk / @llm_wizard from NVIDIA Nemotron
Karan Malhotra from Nous Research
Peter Gostev from Arena
Open Source LLMs
NVIDIA released Nemotron 3 Ultra, a 550B / 55B-active open-weight MoE built for long-running agents, with weights, data, recipes, GenRM, and training assets released (X, Tech Report, Announcement, HF).
NVIDIA also shipped Nemotron 3.5 ASR, a 600M open multilingual streaming STT model for voice agents (X, HF, Benchmark, Voice Agent Repo).
Google dropped Gemma 4 12B, an encoder-free multimodal model that runs locally under Apache 2.0 (X, HF).
MiniMax announced M3, a natively multimodal, 1M-context coding and agentic model with open weights coming soon (X, API, Code).
JetBrains released Mellum2, a 12B MoE with 2.5B active params trained from scratch by a small team (X, Blog, HF).
H Company launched Holo 3.1, local computer-use agents from 0.8B to 35B with new quantized checkpoints (X, Blog).
Big CO LLMs + APIs
NVIDIA announced RTX Spark, its new Arm + Blackwell PC platform for local AI agents and 120B-class local inference (coverage).
Microsoft AI launched seven new MAI models, including MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2 (Blog, Tech Report).
AI Art & Diffusion & 3D
MAI-Image-2.5 landed near the top of Arena image leaderboards, though hands-on tests were mixed (X, Try it).
Ideogram 4.0 became the top open-weight text-to-image model with strong typography and layout control (X, Blog, HF).
Reve 2.0 jumped to #2 on Text-to-Image Arena with native 4K, code-like layout control, and precise editing (X, Blog, Try it).
xAI released Grok Imagine Video 1.5 Preview for image-to-video with synced audio (xAI).
Tools & Agentic Engineering
Arena launched Agent Arena, a new leaderboard for real agent workflows instead of one-shot chatbot prompts (Arena).
Cognition rebranded Windsurf into Devin Desktop, a multi-agent command center with ACP support (X, Announcement).
Nous Research launched Hermes Desktop, bringing Hermes Agent into a native desktop app for Mac, Windows, and Linux (X, Site).
This Weekβs Buzz
WeaveHacks 4 is this weekend in SF with OpenAI, Cursor, DeepMind, and more joining (lu.ma/weavehacks).
Nemotron 3 Ultra is live on CoreWeave Inference through W&B at full NVFP4 precision (Try it).
WolfBench added 3D token-depth bars, making model efficiency much easier to see (wolfbench.ai).
Voice & Audio
ElevenLabs launched Dubbing v2, an audio-to-audio dubbing model that preserves performance across 90+ languages (X, Dubbing).
Cartesia launched Ink-2, a fast streaming STT model built for voice agents (X, Ink, AA).
NVIDIAβs Nemotron 3.5 ASR looks like a major open-source voice-agent infrastructure drop (HF).
AI in Society
Bernie Sanders proposed the American AI Sovereign Wealth Fund Act, calling for public equity stakes in major AI companies (coverage).
Anthropic published When AI Builds Itself, laying out scenarios for AI-driven AI R&D and recursive self-improvement (Anthropic).
AI leaders urged Congress to mandate synthetic DNA/RNA screening and recordkeeping (WIRED).
Anthropic confidentially filed for an IPO, adding another frontier-lab public-market storyline to watch (Axios).