ThursdAI · June 4, 2026

📅 ThursdAI - Jun 4 - NVIDIA drops Nemotron 3 Ultra (550B open), Microsoft becomes a frontier lab, Ideogram 4 goes open, Agent Arena & more

From CoreWeave: This week was kind of nuts, tons of new OpenSource goodness, 3 guests on the show (Arena, Nous Research and NVIDIA) and image gen SOTA models racing to the top.

By Alex Volkov

104 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of June 4, 2026?

This is one of those wonderfully overloaded ThursdAI episodes where the news week refuses to fit inside the runtime. NVIDIA drops Nemotron 3 Ultra live with Chris Alexiuk on hand to explain the 550B open model, Arena launches Agent Arena with Peter Gostev, and Karan joins to talk Hermes Agent turning into a real community tool. Around that spine, Alex and the crew hit Microsoft MAI, Gemma 4, MiniMax M3, Ideogram 4, Reve V2, RTX Spark laptops, ElevenLabs dubbing, Cartesia audio, and Wolfram’s token-usage view for WolfBench.

Show Open & Welcome
ThursdAI TL;DR - Jun 4, 2026
Open Source AI
Google Gemma 4 12B (Encoder-Free Multimodal)
JetBrains Mellum 2
MiniMax M3

Episode Summary

Hosts & Guests

Alex Volkov

Host · AI Evangelist, W&B / CoreWeave

@altryne

Peter Gostev

Head of AI, Arena

@petergostev

Chris Alexiuk

Product Research Engineer, NVIDIA

@llm_wizard

Karan

Co-founder, Nous Research

@karan4d

Wolfram Ravenwolf

AI model evaluator (r/LocalLLaMA)

@WolframRvnwlf

Yam Peleg

AI builder & founder

@Yampeleg

Nisten Tahiraj

AI operator & builder

@nisten

LDJ

Nous Research

@ldjconfirmed

By The Numbers

Nemotron 3 Ultra parameters

550B

NVIDIA open sparse model discussed with Chris Alexiuk; 55B active parameters.

Active parameters

55B

Nemotron 3 Ultra active parameter count for the sparse MoE model.

MAI Thinking 1 total parameters

Microsoft MAI Thinking 1 described as a 1T total, 35B active MoE trained from scratch.

MAI training tokens

33T

Microsoft MAI Thinking 1 was discussed as trained on 33T tokens without distillation.

Ideogram 4 parameters

9.3B

Open-weight text-to-image model focused on text rendering and layout control.

Nemotron ASR throughput

17x

Alex highlights Nemotron 3.5 ASR as 17x faster than Parakeet-style baselines with half the size.

🔥 Breaking During The Show

NVIDIA Nemotron 3 Ultra drops the day of the show

Alex opens with NVIDIA’s new 550B open sparse model as breaking news, then Chris Alexiuk joins from NVIDIA HQ to explain the model, data, recipes, NVFP4 checkpoint, and agentic-harness focus.

Arena launches Agent Arena during the episode

The crew sees Arena’s new real-world agentic evaluation launch live, and Peter Gostev joins to explain why long-running agent tasks require a different benchmark than one-turn chatbot preference battles.

📰 Show Open & Welcome

Alex opens a packed June 4 show with Nemotron 3 Ultra breaking from NVIDIA, a fresh wave of image models, Microsoft declaring a serious frontier-model push, and the usual promise to compress an unreasonable news week into one live show. Chris Alexiuk, Karan, and Peter Gostev are set up as the main guest voices for the episode.

NVIDIA Nemotron 3 Ultra dropped the same day as the show
Microsoft MAI, open image models, and agent benchmarks set the agenda
Chris Alexiuk, Karan, and Peter Gostev join as guests

Alex Volkov

"We have been training for weeks such as these because this week was absolutely stacked with AI news."

⚡ ThursdAI TL;DR - Jun 4, 2026

The fast run-through frames the week: NVIDIA RTX Spark, Microsoft MAI models, MiniMax M3, Gemma 4, Agent Arena, image-model leaderboard chaos, ElevenLabs dubbing, and CoreWeave/W&B hackathon notes. It is the table of contents for a show that keeps getting interrupted by real launches.

Chris Alexiuk joins for Nemotron 3 Ultra
Karan joins for Hermes Agent and Nous Research
Peter Gostev joins to explain Agent Arena

🔓 Open Source AI

The open-source block starts with the usual ThursdAI bias toward models people can inspect, run, and build on. Alex sets aside NVIDIA for the later Chris interview and opens with smaller but important releases from Google, JetBrains, and MiniMax.

Open-source model coverage split into multiple segments
NVIDIA saved for a deeper guest interview
Gemma, Mellum, and MiniMax lead the first pass

🔓 Google Gemma 4 12B (Encoder-Free Multimodal)

Gemma 4 12B gets the first technical dive because its encoder-free multimodal design matters: instead of bolting a separate vision/audio encoder onto a language model, Google is pushing toward one unified network. LDJ and Yam explain why this can make smaller multimodal models cheaper, cleaner, and easier to run locally.

12B parameter encoder-free multimodal model
Apache 2.0 license and 16GB VRAM target
LDJ explains why unified multimodal training matters

LDJ

"Encoder-free gets rid of this, and you actually have it more cohesive, more like you would ideally think maybe the human brain works."

🛠️ JetBrains Mellum 2

JetBrains ships Mellum 2, a 12B mixture-of-experts coding model with only 2.5B active parameters. The panel treats it as another sign that IDE companies are trying to turn years of developer workflow context into model advantage.

12B MoE coding model with 2.5B active parameters
Trained with a three-stage curriculum over 10T tokens
Available on CoreWeave Inference

🔓 MiniMax M3

MiniMax M3 brings a one-billion-token sparse attention context claim and strong coding/agentic benchmark numbers, but the panel keeps the hype measured because weights and licensing details still matter. The practical thread is that MiniMax models already have a following for cheap agentic tool calling even when pure coding quality is debated.

Open-weights frontier coding model announcement
One-billion-token sparse attention context claim
Reported 59 on SWE-bench Pro and 66 on an internal benchmark

🤖 Agent Arena from LMArena

Agent Arena lands live enough to get the breaking-news treatment. Peter Gostev explains why chatbot A/B preference battles are no longer enough and how Arena is moving toward real agent workflows with web search, files, terminals, user corrections, and objective recovery signals.

Arena launches real-world agentic evals at scale
Models are judged on longer workflows, not one-turn chat only
Peter explains the move from battle mode to agent mode

Peter Gostev

"There is something that definitely was missing about this, and we heard a lot about this from the community: longer term, more difficult tasks that can go on for many minutes and hours."

🏢 Microsoft MAI Thinking & Code Models

Microsoft uses Build 2026 to show seven MAI models across thinking, code, image, transcription, and voice. The panel focuses on MAI Thinking 1 and MAI Code 1 Flash as signs that Microsoft AI is becoming a model lab in its own right rather than only an OpenAI distribution channel.

Seven Microsoft AI models announced at Build 2026
MAI Thinking 1 is a 1T total, 35B active MoE
MAI Code 1 Flash ships into GitHub Copilot

🎨 Microsoft MAI Image 2.5

MAI Image 2.5 gets attention because it jumps high on Arena image leaderboards surprisingly quickly. Alex and Peter discuss its strengths in editing, cleanup, diagrams, and documents, while also testing the public playground path for people who want to try it outside heavier Microsoft developer surfaces.

Number two on Arena image-to-image at the time of discussion
Strong image cleanup, background, document, and diagram results
Available through playground.microsoft.ai

🎨 Ideogram 4 (Open Weights)

Ideogram 4 is the rare image-model release that is both strong at text/layout and open weights, even if under a non-commercial license. The panel digs into its 9.3B parameter size, design-arena showing, bounding-box prompting, and the tradeoff between precise structured prompting and casual generation.

9.3B parameter open-weight text-to-image model
Strong design and text-rendering results
Supports bounding-box/layout-style prompting

🎨 Reve V2 (Layout-Based Image Model)

Reve V2 climbs near the top of text-to-image Arena, but Alex’s live testing shows both the promise and weirdness of the model. The interesting part is not perfect portraits; it is the layout engine and editing flow that make precise graphic/image iteration feel different from normal prompt-only generation.

Reve V2 reaches about 1200 ELO on image Arena
Alex tests portrait generation and finds inconsistent identity quality
The layout-first editor is the real differentiator

🔓 Interview: Chris Alexiuk (NVIDIA) - Nemotron 3 Ultra

Chris Alexiuk joins from NVIDIA HQ to unpack Nemotron 3 Ultra, a 550B sparse open model with 55B active parameters designed around agentic harnesses. The conversation covers NVIDIA’s open data, recipes, reward model, GenRM, NVFP4 checkpoint, hybrid Mamba/Transformer architecture, and why speed matters more as agents run longer contexts.

550B total parameters with 55B active
Built for agentic harnesses like OpenCode, Hermes, and OpenClaw
Open weights, data, recipes, reward model, and training details released

Chris Alexiuk

"Nemotron-3 Ultra is a 550 billion parameter, sparse ML model with 55 billion active parameters."

🔊 NVIDIA Nemotron 3.5 ASR

The NVIDIA segment continues into speech with Nemotron 3.5 ASR, a tiny but fast streaming transcription model. Chris credits NVIDIA’s speech research team while Alex highlights the 600M parameter size, 40-language support, and throughput jump that pushes the latency/accuracy frontier.

600M parameter streaming ASR model
Supports 40 languages
Reported 17x more throughput than Parakeet with half the size

Alex Volkov

"It is 600 million parameters. Basically nothing. Runs for 40 languages, which is quite incredible."

💻 NVIDIA RTX Spark & Computex Laptops

The Computex discussion shifts from cloud-scale NVIDIA to local AI PCs. RTX Spark and the new laptop wave put RTX 5070-class GPUs, 128GB memory, and roughly one petaflop of local AI into thin machines, which raises the practical question of what agents should run locally versus remotely.

RTX Spark brings NVIDIA further into AI PCs
128GB memory and roughly 1 petaflop local AI headline the announcement
Chris notes Nemotron Ultra is too large for local laptops, but smaller models are improving fast

🤖 Interview: Karan (Nous Research) - Hermes Agent

Karan joins to talk about Nous Research’s surreal moment at Computex and Hermes Agent’s unexpected community adoption. He frames Hermes as a tool originally built for RL rollouts that escaped into real user workflows, with the community and the agent itself becoming major contributors to its growth.

Jensen Huang showed Nous Research on stage at Computex
Hermes Agent has grown into a widely used open agent harness
Karan says Hermes was built for RL rollouts before the community ran with it

Karan

"We made it to do RL rollouts on, right? We made it for the same reason CodeX or Clawcode were created by those labs."

🛠️ Hermes Harness Engineering & Security

The Hermes discussion turns into a useful taxonomy of harness engineering: prompts, simulated terminals, permissioning, tool environments, and the security boundary around letting agents act. Karan connects today’s agent harnesses back to WorldSim and the older prompt-engineering lineage that made terminal-style agents possible.

Harness engineering is treated as a new craft layer above prompt engineering
WorldSim and simulated-terminal work are framed as precursors
The panel discusses permissions, security, and local control

🖥️ Hermes Desktop

Karan previews Hermes Desktop as a more accessible UI for the same agent power: chat, permissions, tool visibility, admin controls, and local app-style usage. Alex compares the shape of it to Codex-level local harnesses rather than a simple chatbot wrapper.

Hermes Desktop packages Hermes Agent into a desktop UI
Admin controls target small teams, startups, and personal agent fleets
Users can inspect tool calls, reasoning traces, and permissions

🔊 Voice & Audio - ElevenLabs Dubbing V2

The audio section is the live-demo brain-melter: Alex plays ElevenLabs Dubbing V2 translating voices while preserving cadence, expression, intonation, and even stutters. The section includes multilingual demos from Alex, Nisten, and Alex’s daughter Emma, who is only present as a private dubbing example rather than a show participant.

ElevenLabs Dubbing V2 preserves cadence and expression across languages
Alex demos Nisten in Hebrew and his own voice in multiple languages
Emma is a dubbing-demo voice only, not included as a guest

🔊 Cartesia Ink2 Streaming ASR Demo

The show squeezes in Cartesia Ink2 and related audio/transcription notes near the end while Alex starts summarizing the huge episode. It also becomes a bridge into W&B/CoreWeave and WeaveHacks reminders, including hackathon credits and practical builder calls to action.

Cartesia Ink2 streaming ASR gets a short mention/demo slot
Alex recaps the major guests and topics before moving to community notes
WeaveHacks is promoted for San Francisco builders

🧪 WolfBench - Token Usage Visualization

Wolfram shows a WolfBench feature that visualizes not just benchmark score, but token usage. The important point is that two models can look close on a leaderboard while one burns dramatically more tokens, which changes the real cost and latency story.

WolfBench adds a 3D token-usage visualization
Gemini 3.5 Flash and GPT 5.5 are compared through score plus token depth
Wolfram argues cost/time calculations need token usage, not only benchmark bars

Wolfram Ravenwolf

"Something these bars never show is how many tokens did it use to get that score."

📰 Show Wrap-up

Alex closes a two-and-a-half-hour show that still somehow did not cover everything. The final beat thanks the live audience, points listeners to podcast and YouTube versions, and notes that the rumored OpenAI drop did not arrive, which may have been a mercy given how full the episode already was.

Show runs more than two and a half hours
Alex thanks listeners and points to podcast/YouTube versions
The expected OpenAI update did not land during the show

TL;DR and Show Notes - June 4, 2026

Show Notes & Guests
- Alex Volkov - AI Evangelist & Weights & Biases CoreWeave (@altryne)
- Co Hosts - @WolframRvnwlf @yampeleg @ldjconfirmed
- Guests: Chris Alexiuk / @llm_wizard from NVIDIA Nemotron
- Karan Malhotra from Nous Research
- Peter Gostev from Arena
Open Source LLMs
- NVIDIA released Nemotron 3 Ultra, a 550B / 55B-active open-weight MoE built for long-running agents, with weights, data, recipes, GenRM, and training assets released (X, Tech Report, Announcement, HF).
- NVIDIA also shipped Nemotron 3.5 ASR, a 600M open multilingual streaming STT model for voice agents (X, HF, Benchmark, Voice Agent Repo).
- Google dropped Gemma 4 12B, an encoder-free multimodal model that runs locally under Apache 2.0 (X, HF).
- MiniMax announced M3, a natively multimodal, 1M-context coding and agentic model with open weights coming soon (X, API, Code).
- JetBrains released Mellum2, a 12B MoE with 2.5B active params trained from scratch by a small team (X, Blog, HF).
- H Company launched Holo 3.1, local computer-use agents from 0.8B to 35B with new quantized checkpoints (X, Blog).
Big CO LLMs + APIs
- NVIDIA announced RTX Spark, its new Arm + Blackwell PC platform for local AI agents and 120B-class local inference (coverage).
- Microsoft AI launched seven new MAI models, including MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2 (Blog, Tech Report).
AI Art & Diffusion & 3D
- MAI-Image-2.5 landed near the top of Arena image leaderboards, though hands-on tests were mixed (X, Try it).
- Ideogram 4.0 became the top open-weight text-to-image model with strong typography and layout control (X, Blog, HF).
- Reve 2.0 jumped to #2 on Text-to-Image Arena with native 4K, code-like layout control, and precise editing (X, Blog, Try it).
- xAI released Grok Imagine Video 1.5 Preview for image-to-video with synced audio (xAI).
Tools & Agentic Engineering
- Arena launched Agent Arena, a new leaderboard for real agent workflows instead of one-shot chatbot prompts (Arena).
- Cognition rebranded Windsurf into Devin Desktop, a multi-agent command center with ACP support (X, Announcement).
- Nous Research launched Hermes Desktop, bringing Hermes Agent into a native desktop app for Mac, Windows, and Linux (X, Site).
This Week’s Buzz
- WeaveHacks 4 is this weekend in SF with OpenAI, Cursor, DeepMind, and more joining (lu.ma/weavehacks).
- Nemotron 3 Ultra is live on CoreWeave Inference through W&B at full NVFP4 precision (Try it).
- WolfBench added 3D token-depth bars, making model efficiency much easier to see (wolfbench.ai).
Voice & Audio
- ElevenLabs launched Dubbing v2, an audio-to-audio dubbing model that preserves performance across 90+ languages (X, Dubbing).
- Cartesia launched Ink-2, a fast streaming STT model built for voice agents (X, Ink, AA).
- NVIDIA’s Nemotron 3.5 ASR looks like a major open-source voice-agent infrastructure drop (HF).
AI in Society
- Bernie Sanders proposed the American AI Sovereign Wealth Fund Act, calling for public equity stakes in major AI companies (coverage).
- Anthropic published When AI Builds Itself, laying out scenarios for AI-driven AI R&D and recursive self-improvement (Anthropic).
- AI leaders urged Congress to mandate synthetic DNA/RNA screening and recordkeeping (WIRED).
- Anthropic confidentially filed for an IPO, adding another frontier-lab public-market storyline to watch (Axios).

Alex Volkov 0:29

What's going on, everyone?

0:31

Welcome. Welcome to ThursdAI. This is Alex Volkov. I'm coming to you live on June 4th on the highest signal AI news show that you can ever subscribe to. We've been at this for over three and a half years and we have been training. We have been training for weeks such as these because this week was absolutely stacked with AI news from Jensen in the beginning of the week and Minimax released a new version and, NVIDIA has just literally dropped in breaking news Nemotron 3 Ultra and we're gonna have Chris Aleksiuk, friend of the pod, to talk about this. and there's four new image models, three new image models. One of them is open source. We're gonna compare between all of them and Microsoft came out of nowhere declared to be a frontier lab. There's literally just a torrent of AI news, not to mention that supposedly at some point today maybe there's gonna be like a drop from OpenAI. We'll see. we're not in the speculation game. But as always, if you are tuning into ThursdAI, you know that we're gonna try our best to cover all of this in just under two hours. We have two incredible guests today. We have Chris Aleksiuk from NVIDIA, AKA Joe Nemotron, the guy who writes the blog post. Our friend of the pod, Chris, is gonna join us in about an hour, to help me cover all this news, we have our friendly co-hosts here, Yam Peleg and Wolverine, Rafe Wolfe. Welcome guys. LDJ as well.

1:59

All right, this is it. This is a TLDR, this section of ThursdAI where we just run through everything that we have on queue for you for today Thursday, June 4th, our first show in June, with you Alex Volkov, AI Evangelist with Weights & Biases and CoreWeave. with me, co-host today, Wolfram, RavenWolf, Yam Peleg and LDJ. we have two great guests today, Chris Alexiou from NVIDIA, AKA Joe Nemotron, by the way, a friend of the pod, that has been on the pod multiple times to talk about NVIDIA, Three Ultra, which also is breaking news, just dropped today. Another guest is Karan, co-founder of Nous Research, to talk to us about Hermes Agent, Hermes Desktop, and the incredible success that Nous Research has been seeing lately with Jensen calling out Nous Research on stage at Computex. You guys see this? This is crazy. Shout out, huge shout out to Nous Research, our friends from, from way, way back. We talked with Nous Research way before they were even a company, so I just like-- I, I love bringing Karan back to chat about their latest advancements. In the big companies- I actually, I tested their models when Hermes referred to the models and not the agent. Now it has changed a bit. Yep. Hermes models before the agent. Now it's changed a little bit. They're still a research lab. They're still releasing cool shit. all right. In big companies and LMS, I think that maybe we'll have some news about Joule Alpha, a potential GPT update, but we'll see. Meanwhile, NVIDIA decided they changed the PC game forever. NVIDIA RTX Spark was part of the announcements at NVIDIA's Computex. Th-they're claiming that this is their first PC chip ever. They have an RTX 5070 class GPU, 128 gigabyte memory, and one petaflop of AI compute onto thin laptops. You guys see this? This is insane. They're able to run models, big models. This is Jensen obviously showing this up. They're able to run models on these laptops a-and it's quite crazy. They're trying to compete with MacBooks. we're gonna chat about this a little bit. in addition, Microsoft also announced laptops together with these RTX laptops. but Microsoft had their own event this week called Microsoft Build. They showcased a bunch of AI stuff. They switched to agentic almost completely, very similar to Google I/O. Microsoft decided to not only come out with like best, models, they also decided to say, "Hey, we're now a frontier adjacent lab." Not quite a frontier lab. If you guys remember, Microsoft obviously was one of the first investors in OpenAI. Microsoft came out with seven models from MAI. MAI is Microsoft AI's organization that is the result of the acqui-hire of Inflection AI. If you guys remember, we talked about Inflection a long time ago with Mustafa Suleyman and Karan Simhan and like a bunch of other great folks that built a near frontier lab and now they trained models from scratch. Those are not fine tunes of kinda open source Chinese models. they released seven models across thinking and code, image- Transcription and voice, from zero distillation, zero, like all of the data is sourced and, we're gonna talk about them specifically the image models, but also the thinking models obviously. Here are the models, MAI Image two point five, MAI Image two point five Flash, MAI Transcribe. MAI Thinking One is their new LLM and, MAI Code One Flash is the new LLM that's trained specifically for VS Code. Definitely gonna chat about that. I think that this is it in the big companies and news. the only other thing that I can tell you is kinda because SpaceX AI is now a big company. It's SpaceX owns, xAI that owns X, so it's SpaceX AI, and they're, officially filed for IPO and, the prices there are insane. So we're gonna mention this a little bit. And, TBD on, JouleAlpha, maybe OpenAI is gonna drop something, maybe not. All right, folks, AI art and diffusion is the second order layer of important things that I have to bring to you because this week was just absolutely insane. Obviously, I just talked to you about, Microsoft MAI Image 2.5. They hit number three on text to image and number two on image to image Arena leaderboards. Very briefly, s- shortly after that, yesterday Ideogram dropped 4.0. We talked about Ideogram multiple times. they have opened the weights. So shout-out to Ideogram. Let me have this for a shout-out thing. Ideogram dropped the weights for their best text to image model, 9.3 billion parameters. It's a huge one. and they claim insane text rendering and layout control. I've tested all these models. I would… can't wait to show you my tests. but this beat, I think, M- Microsoft's MAI on the benchmarks. and th- they are definitely the number one open model right now. they're beating Flux Dev and Qwen Image and Hunyuan, all these. so shout-out to them for the open sourcing of the model. And, we also had Revæ 2.0. I told you about Revæ. folks, maybe remember Christian Contrelle, the co-founder, used to be in, I think, Stability. Revæ is launching number two text to image model. This one is a layout based model. It's a very interesting approach. You see the layout, shaping as the model con- converges. I would love to show you, some generations from Revæ and their editor specifically. I have compared all of these models. I can't wait to show you for the thumbnails, for this week, so I will show you that. So a lot of news in the AI art and like image diffusion world, although some of these are no, no longer diffusions. And then we go to open source. Open source has been on fire this week. we're going to go to breaking news because this literally just happened. AI breaking news coming at you only on ThursdAI.

7:26

All righty. The biggest breaking news from today from the world of open source is Nvidia drops Nemotron 3 Ultra. We talked about Nemotron 3 before. Nemotron 3 Ultra is a five hundred and fifty billion parameter beast. It's a hybrid Mamba transformer. it's open model, fully open, including the data, and it's built for long-running agents, and for some reason, it has five X faster inference. We're gonna talk with Chris Alexiou from Nvidia about this release, but it's a very interesting release, especially in the world of open source. Nvidia is not playing around, folks. Nvidia is not playing around, and it's great, great, great, great, great to see that American, fully American-based, open source is catching up. I am saving our comments about this model, for when Chris comes here in an hour. I will say, though, another breaking news is that, we have this model up on CW Inference. So shout out to the folks who work really hard. CoreWeave Inference be-best, Weights & Biases is now serving Nemotron 4, Nemotron 3 Ultra in, full NVFP4. folks, we also have a new Google Gemma. Do you guys see this? We have a Gemma Four 12 billion parameter, a little bit bigger. they call it encoder-free multimodal model. Runs on your laptop with 60 gigabytes VRAM and with full Apache 2 license, which is great to see. JetBrains dropped a small model also on CW Inference. It's a 12 billion parameter model with 2.5 billion active parameters, trained from scrunch, trained from scratch by a team of seven people from JetBrains. Shout out to the Jet- JetBrains folks. they have some comparisons here. I don't know if we have quite enough time to go deep into Melum, but Melum 2 from JetBrains, it's great to see. And I think a big one, it's not fully open source yet, but definitely a big one. Wolfram, you can confirm, I see you nodding. Minimax M3 drops, the first of soon open weights model combining front-end coding, one million sparse attention context and native multimodality, which is huge. this is huge news, so shout out to Minimax, for their releases and not yet open weights, but I heard that they're coming out next week. This is the TLDR. This is everything that happened. out of this, we're g- we're gonna choose the most important things to kinda d- d- dive deep on, so please stay with us, as I shoot through this super quick so we can get to the actual show. Some folks prefer this format. Some folks prefer just the highlights so they can go and explore on their own. Let's talk about tools and agentic engineering. Cognition rebrands Windsurf. You guys remember Windsurf? Windsurf is no longer. Windsurf is dead. they rebranded Windsurf to Devin Desktop. It's a multi-agent command center with ACP support. ACP stands for Agent, Communication Protocol, I believe. Wolfram, please correct me if I'm wrong. but basically, it's the way where, you ask one agent to control another agent, so you can control codecs and cloud code, et cetera. Isn't that the one that has two different meanings? The- Yeah, but I specifically mean the- Yeah the one that you can control other agents. Yes, but, Agent, Agent Client Protocol is the right nomenclature for this. and, Windsurf is now Devin Desktop, no, bye-bye, Windsurf. w- we've missed you. it's been a year since your last rebrand, and this is finally the final one. And our friends from Nous Research launched Hermes Desktop. it's, Your Hermes agent goes native on the Mac, Windows, and Linux in public preview. So you can just do Hermes Desktop as a command. You have this beautiful desktop experience. We're gonna, hopefully have screenshots of it because I cannot run this on my work laptop. but we're gonna have Karan, a co-founder of Nous Research, talk about this latent success, exploding success for, for Hermes Agent, with showcases on NVIDIA Computex stage and, tons of exposure as well. All righty, this week's VaaS, a corner where we talk about Weights & Biases and CoreWeave We have a few things for you. First of all, you can join us for this, weekend at Weave Hacks 4. this is a coming weekend. We have few spots left. If you're in San Francisco or want to travel in San Francisco, I'm actually flying out tomorrow, please join us at lu.ma/weavehacks. we have OpenAI for the first time as a sponsor for Weave Hacks. We also have Cursor, we have a bunch of credits for you, a bunch of food, a bunch of great judges that I reached out personally, so shout out to some of the judges. Last but not least, last corner. Folk, we're almost there. Almost starting. Almost starting. Let's add, Nisten here as well. voice and audio. So this is actually launched last week, but it was after the show. Eleven Labs Dubbing V2 is an audio-to-audio model that preserves emotion and performance across ninety languages. It is uncannily good. It is crazy good. It is just… Just trust me on this. My head is not getting as blown anymore because we follow all news all the time and kind of the LLMs are incremental updates. This broke my brain a little bit. This is so crazy good that you must stay here until the end of the show to, to hear us play with this. Really, it's something-- Yam, you weren't here when this happened, but when you hear Nisten speak in Hebrew and you're like, "Oh, shit, this is how Nisten would sound if he actually spoke Hebrew," you will blow your gasket. it's that crazy. So definitely the dubbing is incredible. we also have Cartesia Inc. too. shout out to Cartesia, our friends of the pod. They've been here multiple times with A bunch of their models. they're leading their models because, m- the, the- their models are, hybrids or Mamba Transformer based, and, Inc2 debuts as number one most accurate streaming speech-to-text model on artificial analysis. Number one most accurate speech-to-text model. It is really good. I tested it there last week as well. you can see this in word error rate. Word error rate, this is the lowest word error rate that we've ever seen on the show. And breaking news from today, I did not actually know that this was gonna drop. Nvidia also drops Nemotron 3.5 ASR. It's an open source multilingual streaming s- speech-to-text model. It's faster and cheaper than anything else on the market. So this is news from last week, and then Nvidia dropped faster and cheaper in open source. and our friends from Daily, Quindla Cramer from Daily and Pipekat tested this out. So in addition to just dropping one of the best LLMs in the world, Nvidia also dropped one of the best ASRs and text-to-speech in the world as well. and so we're gonna definitely test out all of those. Folks, with, I h- how did we do? 20 minutes? let's go to open source, I think, because we have a bunch of stuff to talk about. but this is the TLDR. All right, let's do open source.