Episode Summary
From Weights & Biases, a "chill" week that included a new DeepSeek R1, a new SOTA image editing model, 2 interviews with Charlie Holtz and Linus Eckenstam, a discussion about a world building model and a whole lot more! Welcome back to another absolutely wild week in AI! I'm coming to you live from the Fontainebleau Hotel in Vegas at the Imagine AI conference, and wow, what a perfect setting to discuss how AI is literally reimagining our world.
In This Episode
- π Open Source AI & LLMs: DeepSeek Whales & Mind-Bending Papers
- π° Claude Opus 4: A Week Later β The Dev Darling?
- β‘ π This Week's Buzz: Weights & Biases Updates!
- π₯ Vision & Video: Reality is Optional Now
- π¨ Black Forest Labs drops Flux Kontext - SOTA image editing!
- π ποΈ Voice & Audio: Everyone Gets a Voice
- π° Looking Forward: The Convergence is Real
Hosts & Guests
By The Numbers
π Open Source AI & LLMs: DeepSeek Whales & Mind-Bending Papers
DeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance. Weβre talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.
- DeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance.
- Weβre talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.
- Still, itβs likely among the best open-weight models out there.
π° Claude Opus 4: A Week Later β The Dev Darling?
Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark. Charlie Holtz, who's building Chorus (more on that amazing app in a bit!), shared that while it's sometimes "astrology" to judge the vibes of a new model, Opus 4 feels like a step change, especially in coding.
- Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark.
- He mentioned that Claude Code, powered by Opus 4 (and Sonnet 4 for implementation), is now tackling GitHub issues that were too complex just weeks ago.
- He even had a coworker who "vibe coded three websites in a weekend" with it β that's a tangible productivity boost!
β‘ π This Week's Buzz: Weights & Biases Updates!
Alright, time for a quick update from the world of Weights & Biases! 1. Fully Connected is Coming!
- Alright, time for a quick update from the world of Weights & Biases!
- Our flagship 2-day conference, Fully Connected, is happening on June 18th and 19th in San Francisco.
- It's going to be packed with amazing speakers and insights into the world of AI development.
π₯ Vision & Video: Reality is Optional Now
TK: Add prompt theory video Google's VEO3 has completely taken over TikTok with the "Prompt Theory" videos. If you haven't seen these yet, stop reading and watch βοΈ. The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality.
- Google's VEO3 has completely taken over TikTok with the "Prompt Theory" videos.
- If you haven't seen these yet, stop reading and watch βοΈ.
- The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality.
π¨ Black Forest Labs drops Flux Kontext - SOTA image editing!
This came as massive breaking news during the show (thought we didn't catch it live!) - Black Forest Labs, creators of Flux, dropped an incredible Image Editing model called Kontext (really, 3 models, Pro, Max and 12B open source Dev in private preview). The are consistent, context aware text and image editing!
- The are consistent, context aware text and image editing!
- If you used GPT-image to Ghiblify yourself, or VEO, you know that those are not image editing models, your face will look different every generation.
- These images model keep you consistent, while adding what you wanted.
π ποΈ Voice & Audio: Everyone Gets a Voice
KyutAI (the folks behind Moshi) are back with Unmute.sh - a modular wrapper that adds voice to ANY text LLM. The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs. just taking a breath).
- KyutAI (the folks behind Moshi) are back with Unmute.sh - a modular wrapper that adds voice to ANY text LLM.
- The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs.
- What's brilliant about this approach is it preserves all the capabilities of the underlying text model while adding natural voice interaction.
π° Looking Forward: The Convergence is Real
As we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities. We have LLMs getting better at reasoning (even with random rewards!), video models breaking reality, voice models becoming indistinguishable from humans, and it's all happening simultaneously.
- As we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities.
- Charlie's comment that "we are the prompts" might have been said in jest, but it touches on something profound.
- As these models get better at generating realistic worlds, characters, and voices, the line between generated and real continues to blur.
Show Notes & Guests
Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-Hosts - @WolframRvnwlf (@WolframRvnwlf), @yampeleg (@yampeleg) @nisten (@nisten)
Guests - Charlie Holtz (@charliebholtz]), Linus Eckenstam (@LinusEkenstam @LinusEkenstam)
Open Source LLMs
DeepSeek-R1-0528 - Updated reasoning model with AIME 91, LiveCodeBench 73 (Try It)
Learning to Reason Without External Rewards - Paper on random rewards improving models (X)
HaizeLabs j1-nano & j1-micro - Tiny reward models (600M, 1.7B params), RewardBench 80.7% for micro (Tweet, GitHub, HF-micro, HF-nano)
Big CO LLMs + APIs
Claude Opus 4 - #1 on LMArena WebDev, coding step change (X)
Mistral Agents API - Framework for custom tool-using agents (Blog, Tweet)
Mistral Embed SOTA - New state-of-the-art embedding API (X)
OpenAI Advanced Voice Mode - Now sings with new capabilities (X)
Anthropic Voice Mode - Released on mobile for conversational AI (X)
This Weekβs Buzz
AI Art & Diffusion
BFL Flux Kontext - SOTA image editing model for identity-consistent edits (Tweet, Announcement)
Vision & Video
VEO3 Prompt Theory - Viral AI video trend questioning reality on TikTok (X)
Odyssey Interactive Video - Real-time AI world exploration at 30 FPS (Blog, Try It)
HunyuanPortrait - High-fidelity portrait video from one photo (Site, Paper)
HunyuanVideo-Avatar - Audio-driven full-body avatar animation (Site, Tweet)
Voice & Audio
Tools