Episode Summary

Welcome back to ThursdAI! After what felt like ages of non-stop, massive model drops (looking at you, O3 and GPT-4!), we finally got that "chill week" we've been dreaming of since maybe... forever? It seems the big labs are taking a breather, probably gearing up for even bigger things next week (maybe some open source 👀).

Hosts & Guests

Alex Volkov
Alex Volkov
Host · W&B / CoreWeave
@altryne
Kwindla Hultman Kramer
Kwindla Hultman Kramer
Co-Founder & CEO · Daily.co
@kwindla
Maziyar Panahi
Maziyar Panahi
AI Researcher · Independent / Open Source
@MaziyarPanahi
Dexter Horthy
Dexter Horthy
Founder & CEO · HumanLayer
@dexhorthy
LDJ
LDJ
Weekly co-host of ThursdAI · Nous Research
@ldjconfirmed
Yam Peleg
Yam Peleg
Weekly co-host of ThursdAI · AI builder & founder
@Yampeleg
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator · Independent AI evaluator (r/LocalLLaMA)
@WolframRvnwlf
Nisten Tahiraj
Nisten Tahiraj
Weekly co-host of ThursdAI · AI operator & builder
@nisten

By The Numbers

Open Source AI Highlights: Community, Vision, and Ef
70B
Something in the 70B-200B parameter range that could run reasonably on, say, 4 GPUs, leaving room for other models.
Open Source AI Highlights: Community, Vision, and Ef
3B
NVIDIA dropped something really cool this week: the Describe Anything Model (DAM), specifically DAM-3B, a 3 billion parameter multimodal model for region-based image _and_ video captioning.
Open Source AI Highlights: Community, Vision, and Ef
27B
The 27B parameter Gemma 3, for example, drops from needing a hefty 54GB to just 14.1GB !
Open Source AI Highlights: Community, Vision, and Ef
1B
Even the 1B model goes from 2GB to just half a gig.
Open Source AI Highlights: Community, Vision, and Ef
4B
([Reddit thread]( Wolfram already took the 4B QAT model for a spin using LM Studio .

🔓 Open Source AI Highlights: Community, Vision, and Efficiency

Even with the big players quieter on the model release front, the open source scene was buzzing. It feels like this "chill" period gave everyone a chance to focus on refining tools, releasing datasets, and engaging with the community. Perhaps the biggest non-release news of the week was OpenAI actively engaging with the open source community.

  • Even with the big players quieter on the model release front, the open source scene was buzzing.
  • It feels like this "chill" period gave everyone a chance to focus on refining tools, releasing datasets, and engaging with the community.
  • Perhaps the biggest non-release news of the week was OpenAI actively engaging with the open source community.

🔊 Voice and Audio Innovations: Emotional TTS and Smarter Conversations

Even in a "chill" week, the audio space delivered some serious excitement. Kwindla Kramer joined us to break down two major developments. TK: DIA video This one absolutely blew up Twitter, and for good reason.

  • Even in a "chill" week, the audio space delivered some serious excitement.
  • Kwindla Kramer joined us to break down two major developments.
  • This one absolutely blew up Twitter, and for good reason.

🎨 AI Art & Diffusion & 3D: Quick Hits

A slightly quieter week for major art model releases, but still some significant movement: - OpenAI's GPT Image 1 API: We'll cover this in detail in the Big Companies section below, but obviously relevant here too as a major new tool for developers creating AI art and image editing applications .

  • - Hunyuan 3D 2.5 (Tencent): Tencent released an update to their 3D generation model, now boasting 10 billion parameters (up from 1B!) .
  • They're highlighting massive leaps in precision (1024-resolution geometry), high-quality textures with PBR support, and improved skeletal rigging for animation X Post.

🤖 Agent Development Insights: Building Robust Agents with Dex Horthy

With things slightly calmer, it was the perfect time to talk about AI agents – a space buzzing with activity, frameworks, and maybe even a little bit of drama.

  • Dex builds SDKs to help create agents that feel more like digital humans, aiming to deploy them where users already are (Slack, email, etc.), moving beyond simple chat interfaces.
  • His experience led him to identify common patterns and pitfalls when trying to build reliable agents.
  • Many teams building serious, production-ready agents end up writing large parts from scratch.

🎨 Big Companies & APIs: GPT Image and Grok Get Developer Access

While new _foundation models_ were scarce from the giants this week, they did deliver on the API front, opening up powerful capabilities to developers. This was a big one many developers were waiting for. OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available via API under the official name gpt-image-1 ([Docs]( .

  • While new _foundation models_ were scarce from the giants this week, they did deliver on the API front, opening up powerful capabilities to developers.
  • This was a big one many developers were waiting for.
  • OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available via API under the official name gpt-image-1 ([Docs]( .

🎥 Vision and Video: Send AI's Surprise Release & More

Just when we thought the week was winding down on model releases...

  • The demos looked stunning, showcasing impressive long-form video generation with remarkable character consistency – often the Achilles' heel of AI video .
  • Nisten speculated this could be a major step towards usable AI-generated movies, solving the critical face/character consistency problem .
  • They achieve this by predicting video in 24-frame chunks with causal attention between them, allowing for real-time streaming generation where compute doesn't scale with length.

⚡ This Week's Buzz from W\&B / Community

Quick hits on upcoming events and community stuff: - WeaveHacks Coming to SF! Mark your calendars! We're hosting a hackathon focused on building with W\&B Weave at the Weights & Biases office in San Francisco on May 17th-18th [0:06:15].

  • Quick hits on upcoming events and community stuff: - WeaveHacks Coming to SF!
  • We're hosting a hackathon focused on building with W\&B Weave at the Weights & Biases office in San Francisco on May 17th-18th [0:06:15].
  • If you're around, especially if you're coming into town for Google I/O the week after, come hang out, build cool stuff, and say hi!

📰 Wrapping Up the "Chill" Week That Wasn't Quite Chill

Phew! See? Even a "chill" week in AI is overflowing with news when you actually have time to stop and breathe for a second.

  • Even a "chill" week in AI is overflowing with news when you actually have time to stop and breathe for a second.
  • It felt good to have the space to go a little deeper.
  • It was fantastic having Kwin, Maziar, and Dex join the regulars (LDJ, Yam, Wolfram, Nisten) to share their expertise and firsthand insights.
TL;DR and Show Notes (April 23rd, 2024)
  • Hosts and Guests

  • Open Source AI - LLMs, Vision, Voice & more

    • OpenAI OSS Meeting: Insights from Maziar [0:16:37].

    • NVIDIA Describe Anything (DAM-3B): 3B param multimodal LLM for region-based image/video captioning. (X Post, HF model, HF demo)

    • Google Gemma QAT: Quantization-Aware Training models (X, Blog)

  • Big CO LLMs + APIs

  • This weeks Buzz - Weights & Biases

  • Vision & Video

    • Send AI MAGI-1: 24B autoregressive diffusion model for long, streaming video (X Post, GitHub, PDF Report, HF Repo)

    • Character AI AvatarFX: Early access for creating speaking/emoting avatars from images . (Website)

    • Framepack: Mentioned for long video generation (120s) on low VRAM (6GB). (Project Page)

  • Voice & Audio

  • AI Art & Diffusion & 3D

    • Hunyuan 3D 2.5 (Tencent): 10B param update [0:09:06]. Higher res geometry, PBR textures, improved rigging. (X Post)

  • Agents , Tools & Links

    • 12 Factor Agents: Discussion with Dex Horthy on building robust agents (Github Repo)