Episode Summary

Qwen-mas really does strike again here: the show is loaded with Alibaba releases across vision and omni, while Nvidia's OpenAI exposure and the Pulse preview keep the big-company section loud. Vik Korrapati joins to explain why Moondream 3 matters in the tiny-VLM race, and the rest of the episode keeps tying model progress back to real multimodal products.

Hosts & Guests

Alex Volkov
Alex Volkov
Host · W&B / CoreWeave
@altryne
Vik Korrapati
Vik Korrapati
CTO & Co-founder · Moondream AI
@vikhyatk
Yam Peleg
Yam Peleg
AI builder & founder
@Yampeleg
Nisten Tahiraj
Nisten Tahiraj
AI operator & builder
@nisten
LDJ
LDJ
Nous Research
@ldjconfirmed
Ryan Carson
Ryan Carson
AI educator & founder
@ryancarson

🔓 Qwen-mas and the Open-Model Barrage

The episode opens on a flood of Alibaba activity, and the panel treats that pace itself as news. Qwen releases across vision and omni make the show feel like a direct update on how quickly open multimodal systems are improving.

  • Alibaba dominates the open-model portion of the show
  • Vision and omni releases are discussed as workflow tools, not just model cards

🎨 Moondream 3 with Vik Korrapati

Vik Korrapati gives the vision section a sharper engineering lens. The discussion is especially useful because it highlights why small, capable models matter for real products and why the tiny-VLM race is important well beyond benchmark bragging rights.

  • Moondream 3 is framed as a practical product-building model
  • Vik adds clarity on why smaller vision systems still matter

🧪 Robotics, GDP Eval, and the Real-World Agent Push

The middle of the episode moves from model capability into deployment pressure. Robotics, evaluation, and real-world agent tasks all come up as evidence that the next phase of competition is not just about chat quality but about action and reliability.

  • The panel looks for signs that agents are moving closer to real environments
  • Evaluation remains a recurring concern whenever product claims get ambitious

💰 Nvidia, OpenAI, Pulse, and Grok-4 Fast

The big-company section is driven by scale, money, and distribution. Nvidia's OpenAI exposure, Pulse chatter, and Grok-4 Fast all feed a conversation about who is building the most durable product moat as model access becomes more commoditized.

  • Money and infrastructure become central to the story here
  • The panel treats Grok Fast as part of a larger competitive pressure cycle

🔊 Video Models, Suno, and the Audio Demo Pileup

The closing segment runs through video systems, music generation, and live audio demos without feeling scattered. Instead, it reinforces the show's main idea: multimodal product quality is climbing across image, video, music, and voice all at once.

  • Video, music, and voice launches all land in the same closing arc
  • The audio demos are treated as product proof points, not just entertainment
TL;DR & Show notes
  • Hosts and Guests

  • Open Source AI (LLMs, VLMs, Papers & more)

    • DeepSeek V3.1 Terminus: cleaner bilingual output, stronger agents, cheaper long-context (X, HF)

    • Meta’s 32B Code World Model (CWM) released for agentic code reasoning (X, HF)

    • Alibaba Tongyi Qwen on a release streak again: