ThursdAI · April 24, 2025

ThursdAI - Apr 23rd - GPT Image & Grok APIs Drop, OpenAI ❤️ OS? Dia's Wild TTS & Building Better Agents!

From Weights & Biases -> GPT image is finally in the API, what was in the secret OSS meeting from OpenAI, interview with Kwindla on semantic VAD, and 12 factor agents with Dex Horthy

97 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of April 24, 2025?

Welcome back to ThursdAI! After what felt like ages of non-stop, massive model drops (looking at you, O3 and GPT-4!), we finally got that "chill week" we've been dreaming of since maybe... forever? It seems the big labs are taking a breather, probably gearing up for even bigger things next week (maybe some open source 👀).

Open Source AI Highlights: Community, Vision, and Efficiency
Voice and Audio Innovations: Emotional TTS and Smarter Conversations
AI Art & Diffusion & 3D: Quick Hits
Agent Development Insights: Building Robust Agents with Dex Horthy
Big Companies & APIs: GPT Image and Grok Get Developer Access
Vision and Video: Send AI's Surprise Release & More

Episode Summary

Welcome back to ThursdAI! After what felt like ages of non-stop, massive model drops (looking at you, O3 and GPT-4!), we finally got that "chill week" we've been dreaming of since maybe... forever? It seems the big labs are taking a breather, probably gearing up for even bigger things next week (maybe some open source 👀).

In This Episode

🔓 Open Source AI Highlights: Community, Vision, and Efficiency
🔊 Voice and Audio Innovations: Emotional TTS and Smarter Conversations
🎨 AI Art & Diffusion & 3D: Quick Hits
🤖 Agent Development Insights: Building Robust Agents with Dex Horthy
🎨 Big Companies & APIs: GPT Image and Grok Get Developer Access
🎥 Vision and Video: Send AI's Surprise Release & More
⚡ This Week's Buzz from W\&B / Community
📰 Wrapping Up the "Chill" Week That Wasn't Quite Chill

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Kwindla Hultman Kramer

Co-Founder & CEO · Daily.co

@kwindla

Maziyar Panahi

AI Researcher · Independent / Open Source

Dexter Horthy

Founder & CEO · HumanLayer

@dexhorthy

LDJ

Weekly co-host of ThursdAI · Nous Research

Yam Peleg

Weekly co-host of ThursdAI · AI builder & founder

@Yampeleg

Wolfram Ravenwolf

Weekly co-host, AI model evaluator · Independent AI evaluator (r/LocalLLaMA)

Nisten Tahiraj

Weekly co-host of ThursdAI · AI operator & builder

@nisten

By The Numbers

Open Source AI Highlights: Community, Vision, and Ef

70B

Something in the 70B-200B parameter range that could run reasonably on, say, 4 GPUs, leaving room for other models.

Open Source AI Highlights: Community, Vision, and Ef

3B

NVIDIA dropped something really cool this week: the Describe Anything Model (DAM), specifically DAM-3B, a 3 billion parameter multimodal model for region-based image _and_ video captioning.

Open Source AI Highlights: Community, Vision, and Ef

27B

The 27B parameter Gemma 3, for example, drops from needing a hefty 54GB to just 14.1GB !

Open Source AI Highlights: Community, Vision, and Ef

1B

Even the 1B model goes from 2GB to just half a gig.

Open Source AI Highlights: Community, Vision, and Ef

4B

([Reddit thread]( Wolfram already took the 4B QAT model for a spin using LM Studio .

🔓 Open Source AI Highlights: Community, Vision, and Efficiency

Even with the big players quieter on the model release front, the open source scene was buzzing. It feels like this "chill" period gave everyone a chance to focus on refining tools, releasing datasets, and engaging with the community. Perhaps the biggest non-release news of the week was OpenAI actively engaging with the open source community.

Even with the big players quieter on the model release front, the open source scene was buzzing.
It feels like this "chill" period gave everyone a chance to focus on refining tools, releasing datasets, and engaging with the community.
Perhaps the biggest non-release news of the week was OpenAI actively engaging with the open source community.

🔊 Voice and Audio Innovations: Emotional TTS and Smarter Conversations

Even in a "chill" week, the audio space delivered some serious excitement. Kwindla Kramer joined us to break down two major developments. TK: DIA video This one absolutely blew up Twitter, and for good reason.

Even in a "chill" week, the audio space delivered some serious excitement.
Kwindla Kramer joined us to break down two major developments.
This one absolutely blew up Twitter, and for good reason.

🎨 AI Art & Diffusion & 3D: Quick Hits

A slightly quieter week for major art model releases, but still some significant movement: - OpenAI's GPT Image 1 API: We'll cover this in detail in the Big Companies section below, but obviously relevant here too as a major new tool for developers creating AI art and image editing applications .

- Hunyuan 3D 2.5 (Tencent): Tencent released an update to their 3D generation model, now boasting 10 billion parameters (up from 1B!) .
They're highlighting massive leaps in precision (1024-resolution geometry), high-quality textures with PBR support, and improved skeletal rigging for animation X Post.

🤖 Agent Development Insights: Building Robust Agents with Dex Horthy

With things slightly calmer, it was the perfect time to talk about AI agents – a space buzzing with activity, frameworks, and maybe even a little bit of drama.

Dex builds SDKs to help create agents that feel more like digital humans, aiming to deploy them where users already are (Slack, email, etc.), moving beyond simple chat interfaces.
His experience led him to identify common patterns and pitfalls when trying to build reliable agents.
Many teams building serious, production-ready agents end up writing large parts from scratch.

🎨 Big Companies & APIs: GPT Image and Grok Get Developer Access

While new _foundation models_ were scarce from the giants this week, they did deliver on the API front, opening up powerful capabilities to developers. This was a big one many developers were waiting for. OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available via API under the official name gpt-image-1 ([Docs]( .

While new _foundation models_ were scarce from the giants this week, they did deliver on the API front, opening up powerful capabilities to developers.
This was a big one many developers were waiting for.
OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available via API under the official name gpt-image-1 ([Docs]( .

🎥 Vision and Video: Send AI's Surprise Release & More

Just when we thought the week was winding down on model releases...

The demos looked stunning, showcasing impressive long-form video generation with remarkable character consistency – often the Achilles' heel of AI video .
Nisten speculated this could be a major step towards usable AI-generated movies, solving the critical face/character consistency problem .
They achieve this by predicting video in 24-frame chunks with causal attention between them, allowing for real-time streaming generation where compute doesn't scale with length.

⚡ This Week's Buzz from W\&B / Community

Quick hits on upcoming events and community stuff: - WeaveHacks Coming to SF! Mark your calendars! We're hosting a hackathon focused on building with W\&B Weave at the Weights & Biases office in San Francisco on May 17th-18th [0:06:15].

Quick hits on upcoming events and community stuff: - WeaveHacks Coming to SF!
We're hosting a hackathon focused on building with W\&B Weave at the Weights & Biases office in San Francisco on May 17th-18th [0:06:15].
If you're around, especially if you're coming into town for Google I/O the week after, come hang out, build cool stuff, and say hi!

📰 Wrapping Up the "Chill" Week That Wasn't Quite Chill

Phew! See? Even a "chill" week in AI is overflowing with news when you actually have time to stop and breathe for a second.

Even a "chill" week in AI is overflowing with news when you actually have time to stop and breathe for a second.
It felt good to have the space to go a little deeper.
It was fantastic having Kwin, Maziar, and Dex join the regulars (LDJ, Yam, Wolfram, Nisten) to share their expertise and firsthand insights.

TL;DR and Show Notes (April 23rd, 2024)

Hosts and Guests
- Alex Volkov - AI Evangelist & Weights & Biases @altryne
  - Co Hosts - Wolfram Ravenwlf @WolframRvnwlf, Yam Peleg @yampeleg, Nisten Tahiraj @nisten, LDJ @ldjconfirmed
  - Kwindla Kramer @kwindla - Daily Co-Founder // Voice expert
  - Dexter Horthy @dexhorthy - HumanLayer // Agents expert
  - Maziyar Panahi @MaziyarPanahi - OSS maintainer
Open Source AI - LLMs, Vision, Voice & more
- OpenAI OSS Meeting: Insights from Maziar [0:16:37].
- NVIDIA Describe Anything (DAM-3B): 3B param multimodal LLM for region-based image/video captioning. (X Post, HF model, HF demo)
- Google Gemma QAT: Quantization-Aware Training models (X, Blog)
Big CO LLMs + APIs
- OpenAI GPT Image 1 API: (X Post, Docs, API Reference)
- Grok API & App Updates: Grok 3 and Grok 3 Mini available via API. (API Docs, App Update X Post)
This weeks Buzz - Weights & Biases
- WeaveHacks SF: Hackathon May 17-18 at W&B HQ. lu.ma/weavehacks
- Fully Connected: W&B's 2-day conference, June 18-19 in SF fullyconnected.com
Vision & Video
- Send AI MAGI-1: 24B autoregressive diffusion model for long, streaming video (X Post, GitHub, PDF Report, HF Repo)
- Character AI AvatarFX: Early access for creating speaking/emoting avatars from images . (Website)
- Framepack: Mentioned for long video generation (120s) on low VRAM (6GB). (Project Page)
Voice & Audio
- Nari Labs Dia: 1.6B param OSS TTS model (X Post Highlight, HF Model, Github, Fal.ai Demo)
- PipeCat Smart-Turn VAD: Open source semantic VAD model (Github, HF Model, Fal.ai Playground, Try It Demo)
AI Art & Diffusion & 3D
- Hunyuan 3D 2.5 (Tencent): 10B param update [0:09:06]. Higher res geometry, PBR textures, improved rigging. (X Post)
Agents , Tools & Links
- 12 Factor Agents: Discussion with Dex Horthy on building robust agents (Github Repo)

0:00

We're going to talk about OpenAI, releasing advanced voice mode to everyone and how we've played with it and how cool this was. Obviously, advanced voice mode was already in kind of better testing and we've had this and we had this on the show before. If you guys remember, we had Chris Gerdina and then I got access as well, but now everybody has access. And so we have to talk about this. And if you haven't tried voice mode. We will give you maybe some tips and tricks. I definitely know that some folks already jailbroke this to sing. I was not able to get it to sing. Never going to give you up, but I will definitely send you to the right resource once that happening. OpenAI also had some turmoil. We're not, I want to try to keep us at the, let's talk about the actual stuff that happened in the AI, the releases, the open source. When Big Tormo is in kind of executive in OpenAI moves away, we at least mentioned this, right? I don't know if we're going to speculate, but we at least mentioned that Mira Murati and a bunch of other, I think two other senior execs at OpenAI have left OpenAI this week. Apparently, according to them, unrelated to each other, but definitely something that happened. And, uh, OpenAI Dev Day is next, uh, Tuesday. And, ooh, I'm getting live pings. I love the show. I'm getting live pings and DMs of like, if, how you can prompt, uh, Advanced Voice Mode to sing. Yeah. Please keep those going in comments as well. What else happened with OpenAI? I think that's mostly Oh! No, yeah, of course. Of course, OpenAI is about to stop being non profit and become very pro profit.

ThursdAI · April 24

0:00 0:00