Video Generation

Text- and image-to-video models, video editing, avatars, and animation. — 67 releases covered on the show.

June 2026

xAI
New Models

Grok Imagine Video 1.5 Preview

xAI releases Grok Imagine Video 1.5 Preview with synced audio

xAI released a preview of Grok Imagine Video 1.5, an image-to-video model that generates clips with synchronized audio. It adds xAI to the week's crowded race of media-generation model updates.

May 2026

Google DeepMind
New Models

Gemini Omni

Gemini Omni: 'create anything from anything' conversational video editor

Google DeepMind launched Gemini Omni, a multimodal 'create anything from anything' model debuting as Google's first conversational video editor. Unlike pure text-to-video systems, Omni is an iterative multi-turn editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media, in the same way Nano Banana brought Gemini to interactive image editing. It is available in the Gemini app, Google Flow and YouTube, with API support coming soon.

April 2026

HeyGen
Major Features & Updates

HyperFrames + Claude Design integration

HeyGen HyperFrames integrates natively with Claude Design

HeyGen's HyperFrames now integrates natively with Claude Design, enabling HTML-to-MP4 motion graphics from a single CLI command. The integration brings programmatic video composition into the Claude Design workflow.

xAI
Major Features & Updates

Grok Imagine

Grok Imagine update: better lip sync, sound, 30s video extensions

xAI shipped a Grok Imagine update with dramatically improved lip sync and sound. It also adds 30-second video extensions.

New Models

HappyHorse-1.0

HappyHorse-1.0 takes #1 on Artificial Analysis video arena

HappyHorse-1.0, a mysterious 15B-parameter video model from Alibaba's Taotian Group, took the #1 spot on the Artificial Analysis video arena, beating Seedance 2.0, Kling 3.0, and Grok Video. Little is known about the model beyond its size and leaderboard run.

ByteDance
New Models

Seedance 2.0

Seedance 2.0 launches in the US on Replicate

ByteDance's Seedance 2.0 video model became available stateside via Replicate, supporting up to 9 reference images, 3 videos, and 3 audio files per cinematic generation. Peter Gostev confirmed it sits ~80 ELO points above the next video model on Arena, a massive gap in a leaderboard where models usually cluster within 10 points.

Google DeepMind
New Models

Veo 3.1 Lite

Google launches Veo 3.1 Lite at $0.05/sec, cheapest video gen yet

Google released Veo 3.1 Lite, a lighter video generation tier priced at $0.05 per second at 720p, the cheapest video generation offering yet, with further price cuts announced for April 7. The panel framed it as a practical quality-versus-latency tradeoff tier for creator workflows.

March 2026

February 2026

ByteDance
New Models

Seedance 2.0

ByteDance Seedance 2.0 shatters video generation reality

ByteDance launched Seedance 2.0, a unified multimodal video generation model that accepts up to 9 images, 3 videos, and 3 audio clips as references and produces 15-second multi-shot clips with native stereo audio and strong character consistency (a 45-second internal test mode also exists). The panel compared the quality jump to seeing Sora for the first time. Available on the BytePlus platform.

Kling AI
New Models

Kling 3.0

Kling 3.0: 15-second multi-shot video with native audio

Kuaishou's Kling 3.0 launched as an all-in-one AI video creation engine with native multimodal generation, 15-second multi-shot sequences, built-in audio, and character consistency across scenes. Alongside Grok Imagine, it marks the week native audio and lip sync became table stakes for video models.

January 2026

Google DeepMind
New Models

Genie 3 (Project Genie)

Google DeepMind launches Project Genie 3, real-time 24fps world model

Google DeepMind's Genie 3 generates interactive, controllable 3D worlds in real time at 24 frames per second, demoed live on the show with a spaceship exploration and paint persistence on walls. It ships alongside SIMA 2, a self-improving game-playing agent built on Genie 3, and is available to Gemini Ultra subscribers in the US with a one-minute session limit.

24 fps Genie 3 frame rate
Runway
New Models

Runway 4.5

Runway 4.5 launches with image-to-video and audio

Runway launched version 4.5 of its video generation model, adding image-to-video and audio support. It was mentioned in the week's news rundown as part of a busy week for vision and video releases.

December 2025

Google DeepMind
New Models

VEO3

VEO3: native audio video generation crosses the uncanny valley

Google's VEO3 stunned everyone in Q2 with video generation that included native audio, which the crew credits with crossing the uncanny valley for AI video. It was a centerpiece of Google IO 2025 and of Google's comeback year.

OpenAI
New Models

Sora 2

Sora 2 democratizes video generation and floods the internet with memes

Sora 2 opened Q4 in October by democratizing video generation, complete with a social platform, and spawned a wave of memes still circulating at year's end. The show's TL;DR credits it as part of 2025 crossing the uncanny valley for AI media.

Runway
New Models

Runway Gen-4.5

Runway Gen-4.5 takes #1 on the text-to-video leaderboard

Runway's Gen-4.5 video model climbed to the top of the text-to-video leaderboard with a 1,247 Elo rating. The result continued the weekly theme of video generation quality and multimodal consistency improving fast.

1,247 Text-to-video leaderboard Elo

November 2025

Tencent (Hunyuan)
New ModelsOpen weights

HunyuanVideo 1.5

Tencent releases HunyuanVideo 1.5, a lightweight open video model

Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.

October 2025

Decart AI
APIs & Platforms

Real-Time Lip Sync API

Decart ships real-time lip-sync API for live AI avatars

Decart AI released a real-time lip-sync API that modifies an avatar's video frames to match generated speech on the fly. Kwindla Kramer broke down the pipeline on the show: WebRTC audio capture, Whisper transcription, an LLM response, ElevenLabs voice generation, then Decart's model syncing the avatar's lips, all at sub-two-second latency, a key step toward interactive, believable AI characters.

<2s end-to-end pipeline latency
Krea AI
New ModelsOpen weights

Krea Realtime Video

Krea open-sources a 14B real-time video generation model

Krea AI open-sourced a 14-billion-parameter real-time video model, with weights on Hugging Face. It joins the week's clear trend of generative video racing toward live, interactive experiences rather than offline rendering.

14B parameters
Lightricks
New ModelsOpen weights

LTX-2

LTX-2: native 4K audio+video generation engine from Lightricks

Lightricks announced LTX-2 as breaking news on the show: a video generation engine producing native 4K video (no upscaling) with synchronized audio, positioned as a fast, efficient open alternative to closed models like Sora. It is billed as open-source with weights coming this fall.

4K native generation resolution, no upscaling
Reve
Major Features & Updates

Reve video mode

Reve quietly surfaces an unannounced 1080p video mode with sound

Reve's unannounced video mode was spotted this week, generating 1080p video with sound. It was covered briefly in the show's vision and video roundup with no official announcement or links yet.

Baidu
New Models

MuseStreamer

Baidu's MuseStreamer pushes video generations past 20 seconds

Baidu showed off MuseStreamer, a video generation model producing clips longer than 20 seconds. It adds another Chinese lab to the long-form video generation race alongside Veo and Sora.

Google DeepMind
New Models

Veo 3.1

Veo 3.1: Google's next-gen video model launches with cinematic audio

Google DeepMind shipped Veo 3.1, the next version of its video generation model with improved quality and cinematic audio. Senior PM Jessica Gallegos joined the show to discuss how the model and its product packaging (including Flow) are evolving video generation into a real user experience story.

OpenAI
Major Features & Updates

Sora

Sora extends generations to 15s (25s Pro) and adds storyboards

OpenAI upgraded Sora with longer generations, up to 15 seconds for standard users and 25 seconds for Pro, plus a new storyboard feature for multi-shot control. The update keeps Sora competitive as video models race on length and controllability.

September 2025

Alibaba (Wan)
New ModelsOpen weights

Wan 2.2 Animate

Wan Animate brings open-weights character animation and replacement

Alibaba's Wan team released Wan 2.2 Animate, an open-weights model that animates a character image from a performance video, replicating motion and expressions, or swaps a character into existing footage. It landed in the episode's closing run of video releases showing multimodal product quality climbing across the board.

Kling AI
New Models

Kling 2.5 Turbo

Kling 2.5 Turbo upgrades AI video generation quality and cost

Kuaishou's Kling AI shipped Kling 2.5 Turbo, an update to its video generation model with better motion, prompt adherence, and cinematic quality at a lower price. Together with Wan Animate it was cited on the show as proof that video model quality is being turbocharged this season.

ByteDance / Tsinghua
New ModelsOpen weights

HuMo

HuMo: human-centric multimodal video generation from ByteDance/Tsinghua

ByteDance research and Tsinghua released HuMo, a human-centric video generation model that conditions on multimodal inputs (text, image, and audio) to produce videos of people. The weights are available on Hugging Face.

Luma AI
New Models

Ray3

Luma's Ray3: a 'reasoning' video model with native HDR

Luma AI launched Ray3, a video generation model it bills as a 'reasoning' video model, with native HDR output, a fast Draft Mode, and Hi-Fi mastering. It is available in Luma's Dream Machine and feeds the episode's closing theme of a next wave of video models.

July 2025

Dynamics Lab
Products & Apps

Mirage

Mirage debuts as the first AI-native UGC game engine

Dynamics Lab unveiled Mirage, billed as the world's first AI-native user-generated-content game engine, with real-time photorealistic playable demos powered by world-model-style generation. Alex reacted to it live as the most visibly fun demo of the week and a preview of where interactive media is headed.

May 2025

Odyssey
Products & Apps

Odyssey Interactive Video

Odyssey debuts real-time interactive AI video at 30 FPS

Odyssey launched interactive video: real-time AI world exploration rendered at 30 FPS, letting you walk through generated worlds as they are created. A glimpse at world-model-driven media where the video responds to you instead of just playing back.

Tencent (Hunyuan)
New Models

HunyuanPortrait

Tencent's HunyuanPortrait animates portraits from a single photo

Tencent's Hunyuan team published HunyuanPortrait, a model for high-fidelity portrait video generation from a single photo. It animates a still portrait into realistic talking-head video, with an accompanying paper.

Tencent (Hunyuan)
New ModelsOpen weights

HunyuanVideo-Avatar

Tencent releases HunyuanVideo-Avatar for audio-driven avatars

Tencent Hunyuan released HunyuanVideo-Avatar, an audio-driven full-body avatar animation model. Feed it audio and a reference image and it animates a full-body avatar in sync, pushing AI-generated humans further toward indistinguishable.

Alibaba
New ModelsOpen weights

Wan 2.1

Alibaba's Wan 2.1: open-source diffusion-transformer text-to-video suite

Alibaba, the team behind the Qwen LLMs, released Wan 2.1, a full stack of open-source diffusion-transformer text-to-video foundation models. Amid the show's discussion of video-model fatigue, this was called out as a release that cuts through the noise, with weights on Hugging Face and code on GitHub.

Lightricks
New Models

LTX Video (distilled)

LTX distilled model enables near real-time video generation

Lightricks shared a distilled version of its LTX video model that generates video at near real-time speeds. It was highlighted in the vision and video segment as a notable speed milestone for video generation.

Runway
Major Features & Updates

Gen-4 References

Runway References brings character and scene consistency to Gen-4

Runway launched References for Gen-4 on all paid plans, letting creators supply reference images (characters, outfits, locations, even selfies) and use tags in prompts to keep those elements consistent across generations. It tackles AI video's biggest pain point, frame-to-frame identity drift, at no extra credit cost per run.

April 2025

Character.AI
Products & Apps

AvatarFX

Character.AI opens early access to AvatarFX talking avatars

Character.AI announced AvatarFX, now in early access, which turns static images into speaking, emoting video avatars. It targets bringing characters to life for conversational and creative use cases.

Lvmin Zhang (lllyasviel)
New ModelsOpen weights

FramePack

FramePack generates 120-second videos on just 6GB of VRAM

FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.

120s Max video length6GB Minimum VRAM
Sand AI
New ModelsOpen weights

MAGI-1

Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model

Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.

24B Parameters24 Frames per autoregressive chunk
Google DeepMind
Major Features & Updates

Veo 2

Veo 2 video generation hits GA in the API and Gemini App

Google made Veo 2 video generation generally available for developers and rolled it out in the Gemini App. The GA release brings Google's flagship text-to-video model out of preview and into production use.

Kling AI
New Models

Kling 2.0

Kling 2.0 Creative Suite launches

Kuaishou's Kling AI launched Kling 2.0 along with a broader Creative Suite, upgrading its video generation model and tooling. The release kept up the rapid pace in the closed-source video generation race during a packed vision and video week.

Papers & ResearchOpen weights

One-Minute Video Generation with Test-Time Training

Test-Time Training paper one-shots minute-long videos with consistent characters

Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.

1 min Single-shot generated video length
ByteDance
Products & Apps

OmniHuman (via Dreamina)

ByteDance's OmniHuman image-to-avatar model goes public via Dreamina

ByteDance's impressive OmniHuman model, which turns a single image plus audio into a realistic talking avatar video, became publicly usable through the Dreamina (CapCut) website. The results land squarely in uncanny-valley territory, as Alex demonstrated with his own avatar thread.

Meta AI
Papers & Research

MoCha

Meta's MoCha generates movie-grade talking AI characters from speech and text

Meta GenAI researchers published MoCha, a model that generates stunningly realistic, movie-grade talking characters directly from speech plus text. Co-author Cong Wei joined the show to discuss the work, which points at AI actors entering Hollywood-quality territory.

March 2025

HPC-AI Tech
New ModelsOpen weights

Open-Sora 2.0

OpenSora 2.0: 11B open-source video model trained for $200K

OpenSora 2.0 is an 11B parameter open-source video generation model that claims state-of-the-art results while costing only about $200,000 to train. The team claims performance approaching OpenAI's Sora on some benchmarks, underscoring how fast open-source video generation is improving.

Remade AI
New ModelsOpen weights

Wan 2.1 14B I2V LoRA video effects

Remade AI releases 8 open LoRA video effects for Wan 2.1

Remade AI published eight LoRA video effects for Alibaba's Wan 2.1 14B image-to-video model, including effects like squish, inflate, deflate, and cakeify. The open release shows video effects becoming trainable and customizable via LoRAs on top of open video models.

February 2025

Google DeepMind
APIs & Platforms

Veo 2 (via FAL API)

Google's Veo 2 video model becomes available via FAL API

Google DeepMind's Veo 2 video generation model became accessible to developers through FAL's inference API. This was the first broadly available API access to Veo 2, letting builders generate high-quality video from text prompts without waiting on Google's own product surfaces.

Hao AI Lab
Dev ToolsOpen weights

FastVideo

Hao AI Lab's FastVideo makes HunyuanVideo 3x faster with no extra training

Hao AI Lab released FastVideo, a method that makes HunyuanVideo (HY-Video) three times faster with no additional training, using a technique called Sliding Tile Attention that outperforms even flash attention for this workload. Faster inference makes open-source video models far more practical, and it supports HY-Video LoRAs for fine-tuned applications.

Microsoft
New ModelsOpen weights

MUSE (WHAM)

Microsoft MUSE generates playable game worlds from a single second of video

Microsoft's MUSE can generate minutes of playable gameplay from just a single second of video frames and controller actions, preserving screen elements like health bars and percentages. It is based on the World and Human Action Model (WHAM) architecture, trained on a billion gameplay images from Xbox, with the model released on Hugging Face.

StepFun
New ModelsOpen weights

Step-Video-T2V

StepFun open-sources Step-Video-T2V, a SOTA 30B text-to-video model

StepFun released Step-Video-T2V (plus a T2V Turbo variant), a 30 billion parameter state-of-the-art text-to-video model under an MIT license. Results impressed especially on text integration, such as rendering 'We will open source' on a scroll as a character unfurls it, marking one of the strongest open-source video drops of the week.

January 2025

Alibaba (Qwen)
New Models

Qwen2.5-Max

Alibaba launches Qwen2.5-Max flagship model with hidden video gen

Alibaba's Qwen team released Qwen2.5-Max, a large MoE flagship model available through the Qwen Chat interface and API, claiming competitive results against DeepSeek V3 and other frontier models. The chat app also quietly shipped a video generation capability powered by Alibaba's Tongyi Wanxiang.