Audio &amp; Music

xAI Apr 30, 2026

Major Features & Updates

Grok Imagine

Grok Imagine update: better lip sync, sound, 30s video extensions

xAI shipped a Grok Imagine update with dramatically improved lip sync and sound. It also adds 30-second video extensions.

Google blog: Gemini 3.1 Flash TTS ↗Try it in AI Studio ↗Logan Kilpatrick announcement (X) ↗

Google DeepMind Apr 16, 2026

New Models

Gemini 3.1 Flash TTS

Gemini 3.1 Flash TTS tops TTS Arena at 1,211 Elo with 70+ languages

Google released Gemini 3.1 Flash TTS, which leads TTS Arena at 1,211 Elo, supports 70+ languages with inline audio tags, and costs about $0.03 per 60 seconds, roughly 5x cheaper than ElevenLabs. Kwindla noted it is fully promptable like an LLM rather than limited to fixed tags, but its ~3 second time-to-first-token makes it batch-only for now rather than usable in live conversational pipelines.

1,211 TTS Arena Elo

Google announcement (X) ↗Lyria models page (Google DeepMind) ↗

#voice-ai #audio

March 2026

Google DeepMind Mar 26, 2026

New Models

Lyria 3 Pro

Google Lyria 3 Pro generates full 3-minute music tracks with structural control

Google DeepMind released Lyria 3 Pro, its most advanced music model, generating full 3-minute tracks with structural control over intros, verses, choruses, and bridges, and even composing music from images. The crew generated a drum-and-bass ThursdAI opener live with spot-on instruction following; output is SynthID watermarked and royalty-free, available to Gemini subscribers and via Producer AI.

Release noted on X ↗OpenAI models docs ↗

February 2026

OpenAI Feb 26, 2026

New Models

gpt-audio-1.5 & gpt-realtime-1.5

OpenAI releases gpt-audio-1.5 and gpt-realtime-1.5

OpenAI shipped gpt-audio-1.5 and gpt-realtime-1.5, updated audio and realtime voice models available through its platform. The release was covered in the week's voice and audio roundup.

Lyria 3 announcement (X) ↗Lyria on Google DeepMind ↗

#voice-ai #audio #api

Google DeepMind Feb 19, 2026

New Models

Lyria 3

Google DeepMind launches Lyria 3 music generation in the Gemini app

Google DeepMind launched Lyria 3, its most advanced AI music generation model, now available in the Gemini app. It generates 32-second high-fidelity music tracks with creative controls and can compose music from uploaded images. Google also published a prompt guide covering vocals, lyrics, and different styles.

X announcement ↗GitHub ↗Hugging Face ↗Project page ↗

A ACE Step Feb 5, 2026

New ModelsOpen weights

ACE-Step 1.5

ACE-Step 1.5: open-source 'Suno at home' music generation under MIT

ACE-Step 1.5 is an MIT-licensed AI music generator that produces full songs in under 10 seconds on consumer GPUs and runs on a MacBook. The panel demoed it live via Pinocchio, generating a ThursdAI song on the spot, and it is available for one-click install.

X announcement ↗Kling AI ↗

#audio #open-source

Kling AI Feb 5, 2026

New Models

Kling 3.0

Kling 3.0: 15-second multi-shot video with native audio

Kuaishou's Kling 3.0 launched as an all-in-one AI video creation engine with native multimodal generation, 15-second multi-shot sequences, built-in audio, and character consistency across scenes. Alongside Grok Imagine, it marks the week native audio and lip sync became table stakes for video models.

X announcement ↗Grok ↗Artificial Analysis leaderboard ↗

xAI Feb 5, 2026

New Models

Grok Imagine 1.0

Grok Imagine 1.0 tops video arena with native audio and lip sync

xAI launched Grok Imagine 1.0 with 10-second 720p video generation, native audio, and lip sync, taking the #1 spot on the Artificial Analysis text-to-video arena. Generation costs roughly $0.42 per 10-second clip and an API is available.

LTX-2 on GitHub ↗LTX-2 Paper ↗LTX-2 on Replicate ↗

January 2026

Lightricks Jan 8, 2026

New ModelsOpen weights

LTX-2

Lightricks open-sources LTX-2 synchronized audio-video model

Lightricks open-sourced LTX-2, billed as the first truly open audio-video generation model with synchronized audio and video output, releasing full training code alongside the weights. A distilled version is available to try on Replicate.

#video-gen #open-source #audio

December 2025

Google DeepMind Dec 25, 2025

New Models

VEO3

VEO3: native audio video generation crosses the uncanny valley

Google's VEO3 stunned everyone in Q2 with video generation that included native audio, which the crew credits with crossing the uncanny valley for AI video. It was a centerpiece of Google IO 2025 and of Google's comeback year.

Meta SAM Audio (GitHub) ↗SAM Audio (HF) ↗SAM Audio announcement ↗

Meta AI Dec 18, 2025

New ModelsOpen weights

SAM Audio

Meta SAM Audio brings promptable source separation to audio

Meta released SAM Audio, an audio source separation model that extends the Segment Anything concept to sound. It supports multimodal prompting via text, visual, and temporal cues to isolate sources from audio, with weights on Hugging Face and code on GitHub.

Kling VIDEO 2.6 announcement on X ↗

#audio #open-source

Kling AI Dec 4, 2025

New Models

Kling VIDEO 2.6

Kling VIDEO 2.6 adds first native audio generation

Kling released VIDEO 2.6, its first video model with native audio generation, producing sound directly alongside generated footage. It was one of two Kling releases this week spanning video and image generation.

October 2025

Lightricks Oct 23, 2025

New ModelsOpen weights

LTX-2

LTX-2: native 4K audio+video generation engine from Lightricks

Lightricks announced LTX-2 as breaking news on the show: a video generation engine producing native 4K video (no upscaling) with synchronized audio, positioned as a fast, efficient open alternative to closed models like Sora. It is billed as open-source with weights coming this fall.

4K native generation resolution, no upscaling

X ↗Website ↗GitHub ↗

#video-gen #open-source #audio

September 2025

Suno Sep 25, 2025

New Models

Suno v5

Suno v5 raises the bar for AI music generation

Suno rolled out v5, its newest flagship music generation model with cleaner audio quality and more natural vocals. The live audio demos in the show's closing segment were treated as product proof points for how fast AI music quality is climbing.

Blog ↗Paper ↗Hugging Face ↗Announcement on X ↗

May 2025

Stability AI May 15, 2025

New ModelsOpen weights

Stable Audio Open Small

Stability AI and Arm release Stable Audio Open Small for on-device audio

Stability AI, together with Arm, released Stable Audio Open Small, a 341M-parameter open text-to-audio model built for real-world on-device deployment. The show framed it as part of a small comeback for Stability, with weights on Hugging Face and an accompanying paper.

#audio #on-device #open-source

April 2025

Google Apr 17, 2025

New Models

DolphinGemma

DolphinGemma: Google's audio model for decoding dolphin communication

Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.

Blog ↗

GitHub ↗Demo ↗Hugging Face ↗

#audio #research

March 2025

E ElectricAlexis (research) Mar 6, 2025

New ModelsOpen weights

NotaGen

NotaGen open symbolic music model generates classical sheet music

NotaGen is an open symbolic music generation model that produces high-quality classical sheet music rather than raw audio. The release includes code on GitHub, weights on Hugging Face, and a browser demo.

Demo (fal.ai) ↗Hugging Face ↗GitHub ↗

#audio #open-source

January 2025

M M-A-P (Multimodal Art Projection) Jan 30, 2025

New ModelsOpen weights

YuE 7B

YuE 7B: open-source Suno-style music generation model

The Multimodal Art Projection (M-A-P) team released YuE, a 7B open-source music generation model dubbed the 'open Suno' on the show, capable of generating full songs with vocals from lyrics. Weights are on Hugging Face with code on GitHub and a hosted demo on fal.ai.

7B Parameters

#voice-ai #audio #open-source

Riffusion Jan 30, 2025

Products & Apps

Fuzz

Riffusion launches Fuzz music generation, free for now

Riffusion (written as 'Refusion' in the show notes) launched Fuzz, a hosted AI music generation product that is free to use during its initial period. It was highlighted in the voice and audio segment alongside YuE as part of a wave of new AI music tools.

Fuzz (free for now) ↗