Audio & Music

Music generation, sound generation, and general audio models. — 20 releases covered on the show.

April 2026

ElevenLabs
Products & Apps

ElevenMusic

ElevenLabs launches ElevenMusic platform with 4,000+ indie artists

ElevenLabs launched ElevenMusic, a full music platform with discovery, remixing, and royalties, debuting with over 4,000 indie artists. Alex closed the show with an ElevenMusic-generated slow, dreamy indie rock track with reverse vocals.

xAI
Major Features & Updates

Grok Imagine

Grok Imagine update: better lip sync, sound, 30s video extensions

xAI shipped a Grok Imagine update with dramatically improved lip sync and sound. It also adds 30-second video extensions.

Google DeepMind
New Models

Gemini 3.1 Flash TTS

Gemini 3.1 Flash TTS tops TTS Arena at 1,211 Elo with 70+ languages

Google released Gemini 3.1 Flash TTS, which leads TTS Arena at 1,211 Elo, supports 70+ languages with inline audio tags, and costs about $0.03 per 60 seconds, roughly 5x cheaper than ElevenLabs. Kwindla noted it is fully promptable like an LLM rather than limited to fixed tags, but its ~3 second time-to-first-token makes it batch-only for now rather than usable in live conversational pipelines.

1,211 TTS Arena Elo

March 2026

Google DeepMind
New Models

Lyria 3 Pro

Google Lyria 3 Pro generates full 3-minute music tracks with structural control

Google DeepMind released Lyria 3 Pro, its most advanced music model, generating full 3-minute tracks with structural control over intros, verses, choruses, and bridges, and even composing music from images. The crew generated a drum-and-bass ThursdAI opener live with spot-on instruction following; output is SynthID watermarked and royalty-free, available to Gemini subscribers and via Producer AI.

February 2026

Google DeepMind
New Models

Lyria 3

Google DeepMind launches Lyria 3 music generation in the Gemini app

Google DeepMind launched Lyria 3, its most advanced AI music generation model, now available in the Gemini app. It generates 32-second high-fidelity music tracks with creative controls and can compose music from uploaded images. Google also published a prompt guide covering vocals, lyrics, and different styles.

Kling AI
New Models

Kling 3.0

Kling 3.0: 15-second multi-shot video with native audio

Kuaishou's Kling 3.0 launched as an all-in-one AI video creation engine with native multimodal generation, 15-second multi-shot sequences, built-in audio, and character consistency across scenes. Alongside Grok Imagine, it marks the week native audio and lip sync became table stakes for video models.

January 2026

December 2025

Google DeepMind
New Models

VEO3

VEO3: native audio video generation crosses the uncanny valley

Google's VEO3 stunned everyone in Q2 with video generation that included native audio, which the crew credits with crossing the uncanny valley for AI video. It was a centerpiece of Google IO 2025 and of Google's comeback year.

October 2025

Lightricks
New ModelsOpen weights

LTX-2

LTX-2: native 4K audio+video generation engine from Lightricks

Lightricks announced LTX-2 as breaking news on the show: a video generation engine producing native 4K video (no upscaling) with synchronized audio, positioned as a fast, efficient open alternative to closed models like Sora. It is billed as open-source with weights coming this fall.

4K native generation resolution, no upscaling

September 2025

Suno
New Models

Suno v5

Suno v5 raises the bar for AI music generation

Suno rolled out v5, its newest flagship music generation model with cleaner audio quality and more natural vocals. The live audio demos in the show's closing segment were treated as product proof points for how fast AI music quality is climbing.

May 2025

Stability AI
New ModelsOpen weights

Stable Audio Open Small

Stability AI and Arm release Stable Audio Open Small for on-device audio

Stability AI, together with Arm, released Stable Audio Open Small, a 341M-parameter open text-to-audio model built for real-world on-device deployment. The show framed it as part of a small comeback for Stability, with weights on Hugging Face and an accompanying paper.

April 2025

Google
New Models

DolphinGemma

DolphinGemma: Google's audio model for decoding dolphin communication

Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.

March 2025

January 2025

New ModelsOpen weights

YuE 7B

YuE 7B: open-source Suno-style music generation model

The Multimodal Art Projection (M-A-P) team released YuE, a 7B open-source music generation model dubbed the 'open Suno' on the show, capable of generating full songs with vocals from lyrics. Weights are on Hugging Face with code on GitHub and a hosted demo on fal.ai.

7B Parameters
Riffusion
Products & Apps

Fuzz

Riffusion launches Fuzz music generation, free for now

Riffusion (written as 'Refusion' in the show notes) launched Fuzz, a hosted AI music generation product that is free to use during its initial period. It was highlighted in the voice and audio segment alongside YuE as part of a wave of new AI music tools.