Episode Summary

From Weights & Biases, a "chill" week that included a new DeepSeek R1, a new SOTA image editing model, 2 interviews with Charlie Holtz and Linus Eckenstam, a discussion about a world building model and a whole lot more! Welcome back to another absolutely wild week in AI! I'm coming to you live from the Fontainebleau Hotel in Vegas at the Imagine AI conference, and wow, what a perfect setting to discuss how AI is literally reimagining our world.

Hosts & Guests

Alex Volkov
Alex Volkov
Host Β· W&B / CoreWeave
@altryne
Linus Eckenstam
Linus Eckenstam
AI Evangelist & Content Creator Β· Independent / Inside My Head Newsletter
@LinusEkenstam
Charlie Holtz
Charlie Holtz
AI Developer & Founder Β· Independent
@charlieholtz
Yam Peleg
Yam Peleg
Weekly co-host of ThursdAI Β· AI builder & founder
@Yampeleg
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator Β· Independent AI evaluator (r/LocalLLaMA)
@WolframRvnwlf
Nisten Tahiraj
Nisten Tahiraj
Weekly co-host of ThursdAI Β· AI operator & builder
@nisten

By The Numbers

Open Source AI & LLMs: DeepSeek Whales & Mind-Bendin
91
We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.
Open Source AI & LLMs: DeepSeek Whales & Mind-Bendin
8B
And here’s the kickerβ€”they also released an 8B distilled version based on Qwen3, runnable on your laptop.
Open Source AI & LLMs: DeepSeek Whales & Mind-Bendin
3B
GRPO (Group Policy Optimization) - the framework that DeepSeek gave to the world with R1 is based on external rewards (human optimize) and Intuitor seems to be mathcing or even exceeding some of GRPO results when Qwen2.5 3B awas used to finetune.
Claude Opus 4: A Week Later – The Dev Darling?
50%
Linus Eckenstam highlighted how Lovable.dev saw their syntax error rates plummet by nearly 50% after integrating Claude 4.
🐝 This Week's Buzz: Weights & Biases Updates!
100%
You can still grab tickets, and as a ThursdAI listener, use the promo code WBTHURSAI for a 100% off ticket!

πŸ”“ Open Source AI & LLMs: DeepSeek Whales & Mind-Bending Papers

DeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance. We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.

  • DeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance.
  • We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.
  • Still, it’s likely among the best open-weight models out there.

πŸ“° Claude Opus 4: A Week Later – The Dev Darling?

Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark. Charlie Holtz, who's building Chorus (more on that amazing app in a bit!), shared that while it's sometimes "astrology" to judge the vibes of a new model, Opus 4 feels like a step change, especially in coding.

  • Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark.
  • He mentioned that Claude Code, powered by Opus 4 (and Sonnet 4 for implementation), is now tackling GitHub issues that were too complex just weeks ago.
  • He even had a coworker who "vibe coded three websites in a weekend" with it – that's a tangible productivity boost!

⚑ 🐝 This Week's Buzz: Weights & Biases Updates!

Alright, time for a quick update from the world of Weights & Biases! 1. Fully Connected is Coming!

  • Alright, time for a quick update from the world of Weights & Biases!
  • Our flagship 2-day conference, Fully Connected, is happening on June 18th and 19th in San Francisco.
  • It's going to be packed with amazing speakers and insights into the world of AI development.

πŸŽ₯ Vision & Video: Reality is Optional Now

TK: Add prompt theory video Google's VEO3 has completely taken over TikTok with the "Prompt Theory" videos. If you haven't seen these yet, stop reading and watch ☝️. The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality.

  • Google's VEO3 has completely taken over TikTok with the "Prompt Theory" videos.
  • If you haven't seen these yet, stop reading and watch ☝️.
  • The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality.

🎨 Black Forest Labs drops Flux Kontext - SOTA image editing!

This came as massive breaking news during the show (thought we didn't catch it live!) - Black Forest Labs, creators of Flux, dropped an incredible Image Editing model called Kontext (really, 3 models, Pro, Max and 12B open source Dev in private preview). The are consistent, context aware text and image editing!

  • The are consistent, context aware text and image editing!
  • If you used GPT-image to Ghiblify yourself, or VEO, you know that those are not image editing models, your face will look different every generation.
  • These images model keep you consistent, while adding what you wanted.

πŸ”Š πŸŽ™οΈ Voice & Audio: Everyone Gets a Voice

KyutAI (the folks behind Moshi) are back with Unmute.sh - a modular wrapper that adds voice to ANY text LLM. The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs. just taking a breath).

  • KyutAI (the folks behind Moshi) are back with Unmute.sh - a modular wrapper that adds voice to ANY text LLM.
  • The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs.
  • What's brilliant about this approach is it preserves all the capabilities of the underlying text model while adding natural voice interaction.

πŸ“° Looking Forward: The Convergence is Real

As we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities. We have LLMs getting better at reasoning (even with random rewards!), video models breaking reality, voice models becoming indistinguishable from humans, and it's all happening simultaneously.

  • As we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities.
  • Charlie's comment that "we are the prompts" might have been said in jest, but it touches on something profound.
  • As these models get better at generating realistic worlds, characters, and voices, the line between generated and real continues to blur.
TL;DR Links

Show Notes & Guests

  • Alex Volkov - AI Evangelist & Weights & Biases (@altryne)

  • Co-Hosts - @WolframRvnwlf (@WolframRvnwlf), @yampeleg (@yampeleg) @nisten (@nisten)

  • Guests - Charlie Holtz (@charliebholtz]), Linus Eckenstam (@LinusEkenstam @LinusEkenstam)

  • Open Source LLMs

    • DeepSeek-R1-0528 - Updated reasoning model with AIME 91, LiveCodeBench 73 (Try It)

    • Learning to Reason Without External Rewards - Paper on random rewards improving models (X)

    • HaizeLabs j1-nano & j1-micro - Tiny reward models (600M, 1.7B params), RewardBench 80.7% for micro (Tweet, GitHub, HF-micro, HF-nano)

  • Big CO LLMs + APIs

    • Claude Opus 4 - #1 on LMArena WebDev, coding step change (X)

    • Mistral Agents API - Framework for custom tool-using agents (Blog, Tweet)

    • Mistral Embed SOTA - New state-of-the-art embedding API (X)

    • OpenAI Advanced Voice Mode - Now sings with new capabilities (X)

    • Anthropic Voice Mode - Released on mobile for conversational AI (X)

  • This Week’s Buzz

    • Fully Connected - W&B conference, June 18-19, SF, promo code WBTHURSAI (Register)

    • AI Engineer World’s Fair - Next week in SF, 30% off with THANKSTHURSDAI (Register)

  • AI Art & Diffusion

    • BFL Flux Kontext - SOTA image editing model for identity-consistent edits (Tweet, Announcement)

  • Vision & Video

    • VEO3 Prompt Theory - Viral AI video trend questioning reality on TikTok (X)

    • Odyssey Interactive Video - Real-time AI world exploration at 30 FPS (Blog, Try It)

    • HunyuanPortrait - High-fidelity portrait video from one photo (Site, Paper)

    • HunyuanVideo-Avatar - Audio-driven full-body avatar animation (Site, Tweet)

  • Voice & Audio

    • Unmute.sh - KyutAI’s voice wrapper for any LLM, low latency, soon open-source (Try It, X)

    • Chatterbox - Resemble AI’s open-source voice cloning with emotion control (GitHub, HF)

  • Tools

    • Opera NEON - Agent-centric AI browser for autonomous web tasks (Site, Tweet)