Episode Summary
Welcome back to ThursdAI! After what felt like ages of non-stop, massive model drops (looking at you, O3 and GPT-4!), we finally got that "chill week" we've been dreaming of since maybe... forever? It seems the big labs are taking a breather, probably gearing up for even bigger things next week (maybe some open source 👀).
In This Episode
- 🔓 Open Source AI Highlights: Community, Vision, and Efficiency
- 🔊 Voice and Audio Innovations: Emotional TTS and Smarter Conversations
- 🎨 AI Art & Diffusion & 3D: Quick Hits
- 🤖 Agent Development Insights: Building Robust Agents with Dex Horthy
- 🎨 Big Companies & APIs: GPT Image and Grok Get Developer Access
- 🎥 Vision and Video: Send AI's Surprise Release & More
- ⚡ This Week's Buzz from W\&B / Community
- 📰 Wrapping Up the "Chill" Week That Wasn't Quite Chill
Hosts & Guests
By The Numbers
🔓 Open Source AI Highlights: Community, Vision, and Efficiency
Even with the big players quieter on the model release front, the open source scene was buzzing. It feels like this "chill" period gave everyone a chance to focus on refining tools, releasing datasets, and engaging with the community. Perhaps the biggest non-release news of the week was OpenAI actively engaging with the open source community.
- Even with the big players quieter on the model release front, the open source scene was buzzing.
- It feels like this "chill" period gave everyone a chance to focus on refining tools, releasing datasets, and engaging with the community.
- Perhaps the biggest non-release news of the week was OpenAI actively engaging with the open source community.
🔊 Voice and Audio Innovations: Emotional TTS and Smarter Conversations
Even in a "chill" week, the audio space delivered some serious excitement. Kwindla Kramer joined us to break down two major developments. TK: DIA video This one absolutely blew up Twitter, and for good reason.
- Even in a "chill" week, the audio space delivered some serious excitement.
- Kwindla Kramer joined us to break down two major developments.
- This one absolutely blew up Twitter, and for good reason.
🎨 AI Art & Diffusion & 3D: Quick Hits
A slightly quieter week for major art model releases, but still some significant movement: - OpenAI's GPT Image 1 API: We'll cover this in detail in the Big Companies section below, but obviously relevant here too as a major new tool for developers creating AI art and image editing applications .
- - Hunyuan 3D 2.5 (Tencent): Tencent released an update to their 3D generation model, now boasting 10 billion parameters (up from 1B!) .
- They're highlighting massive leaps in precision (1024-resolution geometry), high-quality textures with PBR support, and improved skeletal rigging for animation X Post.
🤖 Agent Development Insights: Building Robust Agents with Dex Horthy
With things slightly calmer, it was the perfect time to talk about AI agents – a space buzzing with activity, frameworks, and maybe even a little bit of drama.
- Dex builds SDKs to help create agents that feel more like digital humans, aiming to deploy them where users already are (Slack, email, etc.), moving beyond simple chat interfaces.
- His experience led him to identify common patterns and pitfalls when trying to build reliable agents.
- Many teams building serious, production-ready agents end up writing large parts from scratch.
🎨 Big Companies & APIs: GPT Image and Grok Get Developer Access
While new _foundation models_ were scarce from the giants this week, they did deliver on the API front, opening up powerful capabilities to developers. This was a big one many developers were waiting for. OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available via API under the official name gpt-image-1 ([Docs]( .
- While new _foundation models_ were scarce from the giants this week, they did deliver on the API front, opening up powerful capabilities to developers.
- This was a big one many developers were waiting for.
- OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available via API under the official name gpt-image-1 ([Docs]( .
🎥 Vision and Video: Send AI's Surprise Release & More
Just when we thought the week was winding down on model releases...
- The demos looked stunning, showcasing impressive long-form video generation with remarkable character consistency – often the Achilles' heel of AI video .
- Nisten speculated this could be a major step towards usable AI-generated movies, solving the critical face/character consistency problem .
- They achieve this by predicting video in 24-frame chunks with causal attention between them, allowing for real-time streaming generation where compute doesn't scale with length.
⚡ This Week's Buzz from W\&B / Community
Quick hits on upcoming events and community stuff: - WeaveHacks Coming to SF! Mark your calendars! We're hosting a hackathon focused on building with W\&B Weave at the Weights & Biases office in San Francisco on May 17th-18th [0:06:15].
- Quick hits on upcoming events and community stuff: - WeaveHacks Coming to SF!
- We're hosting a hackathon focused on building with W\&B Weave at the Weights & Biases office in San Francisco on May 17th-18th [0:06:15].
- If you're around, especially if you're coming into town for Google I/O the week after, come hang out, build cool stuff, and say hi!
📰 Wrapping Up the "Chill" Week That Wasn't Quite Chill
Phew! See? Even a "chill" week in AI is overflowing with news when you actually have time to stop and breathe for a second.
- Even a "chill" week in AI is overflowing with news when you actually have time to stop and breathe for a second.
- It felt good to have the space to go a little deeper.
- It was fantastic having Kwin, Maziar, and Dex join the regulars (LDJ, Yam, Wolfram, Nisten) to share their expertise and firsthand insights.
Hosts and Guests
Alex Volkov - AI Evangelist & Weights & Biases @altryne
Co Hosts - Wolfram Ravenwlf @WolframRvnwlf, Yam Peleg @yampeleg, Nisten Tahiraj @nisten, LDJ @ldjconfirmed
Kwindla Kramer @kwindla - Daily Co-Founder // Voice expert
Dexter Horthy @dexhorthy - HumanLayer // Agents expert
Maziyar Panahi @MaziyarPanahi - OSS maintainer
Open Source AI - LLMs, Vision, Voice & more
Big CO LLMs + APIs
OpenAI GPT Image 1 API: (X Post, Docs, API Reference)
Grok API & App Updates: Grok 3 and Grok 3 Mini available via API. (API Docs, App Update X Post)
This weeks Buzz - Weights & Biases
WeaveHacks SF: Hackathon May 17-18 at W&B HQ. lu.ma/weavehacks
Fully Connected: W&B's 2-day conference, June 18-19 in SF fullyconnected.com
Vision & Video
Send AI MAGI-1: 24B autoregressive diffusion model for long, streaming video (X Post, GitHub, PDF Report, HF Repo)
Character AI AvatarFX: Early access for creating speaking/emoting avatars from images . (Website)
Framepack: Mentioned for long video generation (120s) on low VRAM (6GB). (Project Page)
Voice & Audio
Nari Labs Dia: 1.6B param OSS TTS model (X Post Highlight, HF Model, Github, Fal.ai Demo)
PipeCat Smart-Turn VAD: Open source semantic VAD model (Github, HF Model, Fal.ai Playground, Try It Demo)
AI Art & Diffusion & 3D
Hunyuan 3D 2.5 (Tencent): 10B param update [0:09:06]. Higher res geometry, PBR textures, improved rigging. (X Post)
Agents , Tools & Links
12 Factor Agents: Discussion with Dex Horthy on building robust agents (Github Repo)