ThursdAI · May 29, 2025

📆 ThursdAI - May 29 - DeepSeek R1 Resurfaces, VEO3 viral moments, Opus 4 a week after, Flux Kontext image editing & more AI news

From Weights & Biases, a "chill" week that included a new DeepSeek R1, a new SOTA image editing model, 2 interviews w/ Charlie Holtz & Linus Eckenstam, RL Placebo and a lot more AI

By Alex Volkov

88 min

🎧 Listen to Episode

Spotify Apple Podcasts Substack

What happened in AI the week of May 29, 2025?

From Weights & Biases, a "chill" week that included a new DeepSeek R1, a new SOTA image editing model, 2 interviews with Charlie Holtz and Linus Eckenstam, a discussion about a world building model and a whole lot more! Welcome back to another absolutely wild week in AI! I'm coming to you live from the Fontainebleau Hotel in Vegas at the Imagine AI conference, and wow, what a perfect setting to discuss how AI is literally reimagining our world.

Open Source AI & LLMs: DeepSeek Whales & Mind-Bending Papers
Claude Opus 4: A Week Later – The Dev Darling?
🐝 This Week's Buzz: Weights & Biases Updates!
Vision & Video: Reality is Optional Now
Black Forest Labs drops Flux Kontext - SOTA image editing!
🎙️ Voice & Audio: Everyone Gets a Voice

Episode Summary

In This Episode

🔓 Open Source AI & LLMs: DeepSeek Whales & Mind-Bending Papers
📰 Claude Opus 4: A Week Later – The Dev Darling?
⚡ 🐝 This Week's Buzz: Weights & Biases Updates!
🎥 Vision & Video: Reality is Optional Now
🎨 Black Forest Labs drops Flux Kontext - SOTA image editing!
🔊 🎙️ Voice & Audio: Everyone Gets a Voice
📰 Looking Forward: The Convergence is Real

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Linus Eckenstam

AI Evangelist & Content Creator · Independent / Inside My Head Newsletter

@LinusEkenstam

Charlie Holtz

AI Developer & Founder · Independent

@charlieholtz

Yam Peleg

Weekly co-host of ThursdAI · AI builder & founder

@Yampeleg

Wolfram Ravenwolf

Weekly co-host, AI model evaluator · Independent AI evaluator (r/LocalLLaMA)

@WolframRvnwlf

Nisten Tahiraj

Weekly co-host of ThursdAI · AI operator & builder

@nisten

By The Numbers

Open Source AI & LLMs: DeepSeek Whales & Mind-Bendin

We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.

Open Source AI & LLMs: DeepSeek Whales & Mind-Bendin

And here’s the kicker—they also released an 8B distilled version based on Qwen3, runnable on your laptop.

Open Source AI & LLMs: DeepSeek Whales & Mind-Bendin

GRPO (Group Policy Optimization) - the framework that DeepSeek gave to the world with R1 is based on external rewards (human optimize) and Intuitor seems to be mathcing or even exceeding some of GRPO results when Qwen2.5 3B awas used to finetune.

Claude Opus 4: A Week Later – The Dev Darling?

50%

Linus Eckenstam highlighted how Lovable.dev saw their syntax error rates plummet by nearly 50% after integrating Claude 4.

🐝 This Week's Buzz: Weights & Biases Updates!

100%

You can still grab tickets, and as a ThursdAI listener, use the promo code WBTHURSAI for a 100% off ticket!

🔓 Open Source AI & LLMs: DeepSeek Whales & Mind-Bending Papers

DeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance. We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.

DeepSeek dropped R1-0528 out of nowhere, an update to their reasoning beast with some serious jumps in performance.
We’re talking AIME at 91 (beating previous scores by a mile), LiveCodeBench at 73, and SWE verified at 57.6.
Still, it’s likely among the best open-weight models out there.

📰 Claude Opus 4: A Week Later – The Dev Darling?

Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark. Charlie Holtz, who's building Chorus (more on that amazing app in a bit!), shared that while it's sometimes "astrology" to judge the vibes of a new model, Opus 4 feels like a step change, especially in coding.

Claude Opus 4, whose launch we celebrated live on the show, has had a week to make its mark.
He mentioned that Claude Code, powered by Opus 4 (and Sonnet 4 for implementation), is now tackling GitHub issues that were too complex just weeks ago.
He even had a coworker who "vibe coded three websites in a weekend" with it – that's a tangible productivity boost!

⚡ 🐝 This Week's Buzz: Weights & Biases Updates!

Alright, time for a quick update from the world of Weights & Biases! 1. Fully Connected is Coming!

Alright, time for a quick update from the world of Weights & Biases!
Our flagship 2-day conference, Fully Connected, is happening on June 18th and 19th in San Francisco.
It's going to be packed with amazing speakers and insights into the world of AI development.

🎥 Vision & Video: Reality is Optional Now

TK: Add prompt theory video Google's VEO3 has completely taken over TikTok with the "Prompt Theory" videos. If you haven't seen these yet, stop reading and watch ☝️. The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality.

Google's VEO3 has completely taken over TikTok with the "Prompt Theory" videos.
If you haven't seen these yet, stop reading and watch ☝️.
The concept is brilliant - AI-generated characters discussing whether they're "made of prompts," creating this meta-commentary on consciousness and reality.

🎨 Black Forest Labs drops Flux Kontext - SOTA image editing!

This came as massive breaking news during the show (thought we didn't catch it live!) - Black Forest Labs, creators of Flux, dropped an incredible Image Editing model called Kontext (really, 3 models, Pro, Max and 12B open source Dev in private preview). The are consistent, context aware text and image editing!

The are consistent, context aware text and image editing!
If you used GPT-image to Ghiblify yourself, or VEO, you know that those are not image editing models, your face will look different every generation.
These images model keep you consistent, while adding what you wanted.

🔊 🎙️ Voice & Audio: Everyone Gets a Voice

KyutAI (the folks behind Moshi) are back with Unmute.sh - a modular wrapper that adds voice to ANY text LLM. The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs. just taking a breath).

KyutAI (the folks behind Moshi) are back with Unmute.sh - a modular wrapper that adds voice to ANY text LLM.
The latency is incredible (under 300ms), and it includes semantic VAD (knowing when you've paused for thought vs.
What's brilliant about this approach is it preserves all the capabilities of the underlying text model while adding natural voice interaction.

📰 Looking Forward: The Convergence is Real

As we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities. We have LLMs getting better at reasoning (even with random rewards!), video models breaking reality, voice models becoming indistinguishable from humans, and it's all happening simultaneously.

As we wrapped up the show, I couldn't help but reflect on the massive convergence happening across all these modalities.
Charlie's comment that "we are the prompts" might have been said in jest, but it touches on something profound.
As these models get better at generating realistic worlds, characters, and voices, the line between generated and real continues to blur.

TL;DR Links

Show Notes & Guests

Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-Hosts - @WolframRvnwlf (@WolframRvnwlf), @yampeleg (@yampeleg) @nisten (@nisten)
Guests - Charlie Holtz (@charliebholtz]), Linus Eckenstam (@LinusEkenstam @LinusEkenstam)
Open Source LLMs
- DeepSeek-R1-0528 - Updated reasoning model with AIME 91, LiveCodeBench 73 (Try It)
- Learning to Reason Without External Rewards - Paper on random rewards improving models (X)
- HaizeLabs j1-nano & j1-micro - Tiny reward models (600M, 1.7B params), RewardBench 80.7% for micro (Tweet, GitHub, HF-micro, HF-nano)
Big CO LLMs + APIs
- Claude Opus 4 - #1 on LMArena WebDev, coding step change (X)
- Mistral Agents API - Framework for custom tool-using agents (Blog, Tweet)
- Mistral Embed SOTA - New state-of-the-art embedding API (X)
- OpenAI Advanced Voice Mode - Now sings with new capabilities (X)
- Anthropic Voice Mode - Released on mobile for conversational AI (X)
This Week’s Buzz
- Fully Connected - W&B conference, June 18-19, SF, promo code WBTHURSAI (Register)
- AI Engineer World’s Fair - Next week in SF, 30% off with THANKSTHURSDAI (Register)
AI Art & Diffusion
- BFL Flux Kontext - SOTA image editing model for identity-consistent edits (Tweet, Announcement)
Vision & Video
- VEO3 Prompt Theory - Viral AI video trend questioning reality on TikTok (X)
- Odyssey Interactive Video - Real-time AI world exploration at 30 FPS (Blog, Try It)
- HunyuanPortrait - High-fidelity portrait video from one photo (Site, Paper)
- HunyuanVideo-Avatar - Audio-driven full-body avatar animation (Site, Tweet)
Voice & Audio
- Unmute.sh - KyutAI’s voice wrapper for any LLM, low latency, soon open-source (Try It, X)
- Chatterbox - Resemble AI’s open-source voice cloning with emotion control (GitHub, HF)
Tools
- Opera NEON - Agent-centric AI browser for autonomous web tasks (Site, Tweet)

Alex Volkov 0:32

Welcome, everyone, to ThursdAI.

0:34

For May 29th, this is Alex Volkov. guys, I need you on mute when you type Nisten specifically. welcome everyone to ThursdAI, today is May 29th, my name is Alex Volkov, I'm an AI evangelist with Weights Biases, and we're here, at least I'm here, my co hosts are, as well, remote, but at least I'm here at the Fontainebleau Hotel Vegas at this conference called Imagine, oh right here, Imagine AI, and yeah, I'm very excited to bring you the podcast right from here. This is pretty cool. we got a studio set up and hopefully everybody on the Twitter space can hear everything. We're also live on, Twitter, we're live on YouTube, on LinkedIn. If you can't hear any part of the show, just jump on the YouTube space, hopefully this will be fine. With me today we have All the regular flans, we have Wolfram all the way from Germany, we have Yam Peleg and Nisten, we have a bunch of things to talk about, how are you guys doing? Wolfram, how are you doing, man?

Wolfram Ravenwolf 1:28

I'm fine, I'm so happy to be back on the show,

1:30

after two weeks in conferences in Germany, so I'm happy to be here now. I'm not in Vegas, unfortunately, but I'm joining from here.

Alex Volkov 1:39

we did see you, you jumped into the livestream of

1:42

Cloud Opus While you were here. Yeah, for just a little bit. and then we definitely, we need to talk to you about Cloud Opus 4 and like how your experience with this as well. so that's great to have you. Nisten, how have you been? How was your AI week? Very chill one for us for now.

Nisten 1:57

I was super busy just launching two different entire things.

2:03

I won't talk about them yet. But, yeah, I've been really liking DevStraw. I'm trying to take the entire workflow of like codecs and cloud code and actually make it into something reliable. as in like open source reliable, we need to work on that.

Yam Peleg 2:22

We need to work on that

Nisten 2:23

because one change and it's poof, it doesn't work anymore.

Yam Peleg 2:27

I have a lot.