Video Generation

Text- and image-to-video models, video editing, avatars, and animation. — 70 releases covered on the show.

July 2026

Meta AI Jul 7, 2026

New Models

Muse Image & Muse Video

Meta Superintelligence Labs ships Muse Image and previews Muse Video

MSL's first media-generation models: Muse Image is live in the Meta AI app, Instagram Stories (US) and WhatsApp, with agentic generation that calls web search and code execution, multi-reference composition, and Instagram social-context conditioning. Muse Video shares the same pretraining base and adds native audio, debuting at #3 on Arena text-to-video while Muse Image lands #2 on image. There is no public API, and public Instagram accounts are opted in to @-mention remixing by default.

#2 Arena text-to-image debut#3 Arena text-to-video debut1280 Arena image score

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#image-gen #video-gen #consumer-ai

Google DeepMind Jul 2, 2026

New Models

OmniFlash

Google DeepMind debuts OmniFlash, first of the any-to-any Omni family

OmniFlash — first of Google's any-to-any Omni family — generates videos up to 10 seconds with precise conversational multi-turn editing via the Interactions API: say 'make it daytime' and it redoes light, sky and shadows. Editing Elo 1087 at $0.10 per second of output.

1087 editing Elo$0.10 per second of video, up to 10s

🎙️ Hear our coverage →

#video-gen #multimodal

June 2026

xAI Jun 18, 2026

New Models

Grok Imagine Video 1.5

xAI launches Grok Imagine Video 1.5 with faster generation and native audio

xAI launched Grok Imagine Video 1.5 with nearly 2x faster generation, native audio, and a claimed #1 leaderboard position. The episode grouped it with Gemini Omni as part of the week’s video-generation frontier.

~2x faster generation

xAI announcement on X ↗Grok Imagine Video 1.5 blog ↗xAI video generation docs ↗

🎙️ Hear our coverage →

#video-gen #multimodal #consumer-ai

xAI Jun 4, 2026

New Models

Grok Imagine Video 1.5 Preview

xAI releases Grok Imagine Video 1.5 Preview with synced audio

xAI released a preview of Grok Imagine Video 1.5, an image-to-video model that generates clips with synchronized audio. It adds xAI to the week's crowded race of media-generation model updates.

xAI announcement ↗

🎙️ Hear our coverage →

May 2026

Runway May 28, 2026

Products & Apps

Project Luxo

Runway launches Project Luxo for solo-creator short films

Runway launched Project Luxo, claiming AI-generated video has crossed the uncanny valley for solo-creator short films. The pitch is that a single creator can now produce watchable short-form films end to end with Runway's stack.

Runway Project Luxo — blog ↗Runway announcement ↗

🎙️ Hear our coverage →

#video-gen #image-gen

Google DeepMind May 21, 2026

New Models

Gemini Omni

Gemini Omni: 'create anything from anything' conversational video editor

Google DeepMind launched Gemini Omni, a multimodal 'create anything from anything' model debuting as Google's first conversational video editor. Unlike pure text-to-video systems, Omni is an iterative multi-turn editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media, in the same way Nano Banana brought Gemini to interactive image editing. It is available in the Gemini app, Google Flow and YouTube, with API support coming soon.

DeepMind model page ↗Google DeepMind on X ↗Logan on availability ↗Gemini App ↗

🎙️ Hear our coverage (+1 follow-up) →

#video-gen #multimodal #image-gen

P Perceptron AI May 14, 2026

New Models

Perceptron Mk1

Perceptron Mk1: frontier video + embodied reasoning at 1/10th the price

Perceptron released Mk1, a frontier video and embodied reasoning model priced at roughly a tenth of comparable models. It scores 88.5 on VSI-Bench and 72.4 on RefSpatialBench (versus 9.0 for GPT-5m on the latter) and is live on OpenRouter.

X announcement ↗Site ↗

🎙️ Hear our coverage →

#video-gen #robotics #vision

April 2026

HeyGen Apr 30, 2026

Major Features & Updates

HyperFrames + Claude Design integration

HeyGen HyperFrames integrates natively with Claude Design

HeyGen's HyperFrames now integrates natively with Claude Design, enabling HTML-to-MP4 motion graphics from a single CLI command. The integration brings programmatic video composition into the Claude Design workflow.

hyperframes.dev ↗

🎙️ Hear our coverage →

#video-gen #coding

xAI Apr 30, 2026

Major Features & Updates

Grok Imagine

Grok Imagine update: better lip sync, sound, 30s video extensions

xAI shipped a Grok Imagine update with dramatically improved lip sync and sound. It also adds 30-second video extensions.

🎙️ Hear our coverage →

#video-gen #audio

Alibaba (Taotian Group) Apr 9, 2026

New Models

HappyHorse-1.0

HappyHorse-1.0 takes #1 on Artificial Analysis video arena

HappyHorse-1.0, a mysterious 15B-parameter video model from Alibaba's Taotian Group, took the #1 spot on the Artificial Analysis video arena, beating Seedance 2.0, Kling 3.0, and Grok Video. Little is known about the model beyond its size and leaderboard run.

Artificial Analysis on X ↗venturetwins on X ↗HappyHorse on X ↗HappyHorse blog ↗

🎙️ Hear our coverage →

ByteDance Apr 9, 2026

New Models

Seedance 2.0

Seedance 2.0 launches in the US on Replicate

ByteDance's Seedance 2.0 video model became available stateside via Replicate, supporting up to 9 reference images, 3 videos, and 3 audio files per cinematic generation. Peter Gostev confirmed it sits ~80 ELO points above the next video model on Arena, a massive gap in a leaderboard where models usually cluster within 10 points.

Replicate announcement on X ↗Seedance announcement ↗

🎙️ Hear our coverage →

Google DeepMind Apr 2, 2026

New Models

Veo 3.1 Lite

Google launches Veo 3.1 Lite at $0.05/sec, cheapest video gen yet

Google released Veo 3.1 Lite, a lighter video generation tier priced at $0.05 per second at 720p, the cheapest video generation offering yet, with further price cuts announced for April 7. The panel framed it as a practical quality-versus-latency tradeoff tier for creator workflows.

Logan Kilpatrick announcement (X) ↗Gemini API video docs ↗Pricing ↗

🎙️ Hear our coverage →

March 2026

Lightricks Mar 13, 2026

New ModelsOpen weights

LTX Video 2.3

Lightricks ships open-source LTX Video 2.3, runs on an RTX 3090

Lightricks released LTX Video 2.3, an open-source video generation model with improved motion, audio, and quality that runs on a single RTX 3090. It is available on GitHub and Hugging Face.

LTX-Video on GitHub ↗LTX-Video on HuggingFace ↗

🎙️ Hear our coverage →

#video-gen #open-source

February 2026

ByteDance Feb 12, 2026

New Models

Seedance 2.0

ByteDance Seedance 2.0 shatters video generation reality

ByteDance launched Seedance 2.0, a unified multimodal video generation model that accepts up to 9 images, 3 videos, and 3 audio clips as references and produces 15-second multi-shot clips with native stereo audio and strong character consistency (a 45-second internal test mode also exists). The panel compared the quality jump to seeing Sora for the first time. Available on the BytePlus platform.

Alex's demo thread on X ↗Official launch blog ↗Seedance 2.0 announcement page ↗Seedance 2.0 in CapCut on X ↗

🎙️ Hear our coverage (+1 follow-up) →

#video-gen #multimodal #consumer-ai

Ant Group Feb 5, 2026

New ModelsOpen weights

LingBot-World

LingBot-World: open-source world model challenges Google Genie 3

Ant Group released LingBot-World, an open-source world model that generates 10-minute playable environments at 16fps. It positions open weights as a direct challenger to Google's closed Genie 3 in interactive world generation.

X thread ↗Hugging Face ↗

🎙️ Hear our coverage →

#world-models #video-gen #open-source

Kling AI Feb 5, 2026

New Models

Kling 3.0

Kling 3.0: 15-second multi-shot video with native audio

Kuaishou's Kling 3.0 launched as an all-in-one AI video creation engine with native multimodal generation, 15-second multi-shot sequences, built-in audio, and character consistency across scenes. Alongside Grok Imagine, it marks the week native audio and lip sync became table stakes for video models.

X announcement ↗Kling AI ↗

🎙️ Hear our coverage →

#video-gen #audio

xAI Feb 5, 2026

New Models

Grok Imagine 1.0

Grok Imagine 1.0 tops video arena with native audio and lip sync

xAI launched Grok Imagine 1.0 with 10-second 720p video generation, native audio, and lip sync, taking the #1 spot on the Artificial Analysis text-to-video arena. Generation costs roughly $0.42 per 10-second clip and an API is available.

X announcement ↗Grok ↗Artificial Analysis leaderboard ↗

🎙️ Hear our coverage →

#video-gen #audio

January 2026

Decart Jan 29, 2026

New Models

Lucy 2.0

Lucy 2.0 real-time video generation model

Lucy 2.0, a real-time video generation model, was discussed in the AI Art segment. The episode covered its real-time video capabilities.

🎙️ Hear our coverage →

#video-gen #voice-ai

Google DeepMind Jan 29, 2026

New Models

Genie 3 (Project Genie)

Google DeepMind launches Project Genie 3, real-time 24fps world model

Google DeepMind's Genie 3 generates interactive, controllable 3D worlds in real time at 24 frames per second, demoed live on the show with a spaceship exploration and paint persistence on walls. It ships alongside SIMA 2, a self-improving game-playing agent built on Genie 3, and is available to Gemini Ultra subscribers in the US with a one-minute session limit.

24 fps Genie 3 frame rate

Announcement (X) ↗Project Genie ↗

🎙️ Hear our coverage →

#world-models #video-gen

xAI Jan 29, 2026

APIs & Platforms

Grok Imagine API

xAI launches Grok Imagine API with video generation

xAI released the Grok Imagine API, exposing its image and video generation capabilities to developers through the xAI console. The show subtitle notes Grok Imagine ranking #1 among generation models this week.

Announcement (X) ↗xAI Console ↗

🎙️ Hear our coverage →

#video-gen #image-gen #api

O Overworld Jan 22, 2026

New Models

Waypoint-1

Overworld's Waypoint-1: real-time AI world model at 60fps on consumer GPUs

Overworld released Waypoint-1, a real-time AI world model that runs at 60fps on consumer GPUs. It generates interactive environments live, bringing world-model tech out of research demos and onto hardware people actually own.

Overworld Waypoint-1 announcement (X) ↗Overworld — official site ↗

🎙️ Hear our coverage →

#world-models #video-gen

Runway Jan 22, 2026

New Models

Runway 4.5

Runway 4.5 launches with image-to-video and audio

Runway launched version 4.5 of its video generation model, adding image-to-video and audio support. It was mentioned in the week's news rundown as part of a busy week for vision and video releases.

Runway 4.5 launch (X) ↗

🎙️ Hear our coverage →

KAIST Jan 8, 2026

Papers & Research

Avatar Forcing

KAIST's Avatar Forcing: real-time interactive talking heads

KAIST published Avatar Forcing, a framework for real-time interactive talking-head avatars with approximately 500ms latency. The paper targets responsive, live avatar interaction rather than offline video generation.

Avatar Forcing Paper (KAIST) ↗

🎙️ Hear our coverage →

#video-gen #voice-ai

Lightricks Jan 8, 2026

New ModelsOpen weights

LTX-2

Lightricks open-sources LTX-2 synchronized audio-video model

Lightricks open-sourced LTX-2, billed as the first truly open audio-video generation model with synchronized audio and video output, releasing full training code alongside the weights. A distilled version is available to try on Replicate.

LTX-2 on GitHub ↗LTX-2 Paper ↗LTX-2 on Replicate ↗

🎙️ Hear our coverage →

#video-gen #open-source #audio

December 2025

Google DeepMind Dec 25, 2025

New Models

VEO3

VEO3: native audio video generation crosses the uncanny valley

Google's VEO3 stunned everyone in Q2 with video generation that included native audio, which the crew credits with crossing the uncanny valley for AI video. It was a centerpiece of Google IO 2025 and of Google's comeback year.

🎙️ Hear our coverage →

#video-gen #audio

OpenAI Dec 25, 2025

New Models

Sora 2

Sora 2 democratizes video generation and floods the internet with memes

Sora 2 opened Q4 in October by democratizing video generation, complete with a social platform, and spawned a wave of memes still circulating at year's end. The show's TL;DR credits it as part of 2025 crossing the uncanny valley for AI media.

🎙️ Hear our coverage →

Kling AI Dec 4, 2025

New Models

Kling VIDEO 2.6

Kling VIDEO 2.6 adds first native audio generation

Kling released VIDEO 2.6, its first video model with native audio generation, producing sound directly alongside generated footage. It was one of two Kling releases this week spanning video and image generation.

Kling VIDEO 2.6 announcement on X ↗

🎙️ Hear our coverage →

#video-gen #audio

Runway Dec 4, 2025

New Models

Runway Gen-4.5

Runway Gen-4.5 takes #1 on the text-to-video leaderboard

Runway's Gen-4.5 video model climbed to the top of the text-to-video leaderboard with a 1,247 Elo rating. The result continued the weekly theme of video generation quality and multimodal consistency improving fast.

1,247 Text-to-video leaderboard Elo

Runway Gen-4.5 update ↗

🎙️ Hear our coverage →

November 2025

LTX Studio (Lightricks) Nov 27, 2025

Products & Apps

LTX Retake

LTX Studio's Retake brings Photoshop-style object editing to video

LTX Studio launched Retake, an AI video editing tool that enables inpainting-style editing of specific objects within video frames. Wolfram called it 'the image editing moment for video' — Photoshop for video, available to try on Replicate.

LTX Retake on Replicate ↗LTX Retake Announcement on X ↗

🎙️ Hear our coverage →

Tencent (Hunyuan) Nov 27, 2025

New ModelsOpen weights

HunyuanVideo 1.5

Tencent releases HunyuanVideo 1.5, a lightweight open video model

Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.

HunyuanVideo on HuggingFace ↗HunyuanVideo on GitHub ↗HunyuanVideo 1.5 Announcement on X ↗

🎙️ Hear our coverage →

#video-gen #open-source #architecture

October 2025

MiniMax (Hailuo) Oct 30, 2025

New Models

Hailuo 2.3

Hailuo 2.3: MiniMax's cinema-grade video generation model

MiniMax's Hailuo team released version 2.3 of its video generation model, pitching cinema-grade output quality. It landed in the same week as MiniMax M2 and Speech 2.6, underlining how broadly MiniMax is shipping across text, voice, and video.

X announcement ↗Hailuo 2.3 examples ↗Model details ↗

🎙️ Hear our coverage →

Odyssey ML Oct 30, 2025

Products & Apps

Odyssey V2

Odyssey V2: real-time interactive AI video you can steer as it generates

Odyssey ML launched V2 of its real-time interactive AI video experience, where the video stream is generated live and responds to user input. The panel grouped it with the week's evidence that video is becoming an interactive product surface rather than a render-and-wait demo.

X announcement ↗Experience it live ↗

🎙️ Hear our coverage →

#video-gen #world-models

OpenAI Oct 30, 2025

Major Features & Updates

Sora (Character Cameos)

Sora drops invite requirement and adds Character Cameos

OpenAI removed the invite requirement for the Sora app and shipped Character Cameos, letting users create reusable characters that can appear across generated videos. The update widens access to Sora as OpenAI pushes it as a consumer video product.

X announcement ↗Sonia cameo example ↗

🎙️ Hear our coverage →

#video-gen #consumer-ai

Decart AI Oct 23, 2025

APIs & Platforms

Real-Time Lip Sync API

Decart ships real-time lip-sync API for live AI avatars

Decart AI released a real-time lip-sync API that modifies an avatar's video frames to match generated speech on the fly. Kwindla Kramer broke down the pipeline on the show: WebRTC audio capture, Whisper transcription, an LLM response, ElevenLabs voice generation, then Decart's model syncing the avatar's lips, all at sub-two-second latency, a key step toward interactive, believable AI characters.

<2s end-to-end pipeline latency

🎙️ Hear our coverage →

#voice-ai #video-gen

Krea AI Oct 23, 2025

New ModelsOpen weights

Krea Realtime Video

Krea open-sources a 14B real-time video generation model

Krea AI open-sourced a 14-billion-parameter real-time video model, with weights on Hugging Face. It joins the week's clear trend of generative video racing toward live, interactive experiences rather than offline rendering.

14B parameters

🎙️ Hear our coverage →

#video-gen #voice-ai #open-source

Lightricks Oct 23, 2025

New ModelsOpen weights

LTX-2

LTX-2: native 4K audio+video generation engine from Lightricks

Lightricks announced LTX-2 as breaking news on the show: a video generation engine producing native 4K video (no upscaling) with synchronized audio, positioned as a fast, efficient open alternative to closed models like Sora. It is billed as open-source with weights coming this fall.

4K native generation resolution, no upscaling

X ↗Website ↗GitHub ↗

🎙️ Hear our coverage →

#video-gen #open-source #audio

Reve Oct 23, 2025

Major Features & Updates

Reve video mode

Reve quietly surfaces an unannounced 1080p video mode with sound

Reve's unannounced video mode was spotted this week, generating 1080p video with sound. It was covered briefly in the show's vision and video roundup with no official announcement or links yet.

🎙️ Hear our coverage →

Baidu Oct 16, 2025

New Models

MuseStreamer

Baidu's MuseStreamer pushes video generations past 20 seconds

Baidu showed off MuseStreamer, a video generation model producing clips longer than 20 seconds. It adds another Chinese lab to the long-form video generation race alongside Veo and Sora.

🎙️ Hear our coverage →

Google DeepMind Oct 16, 2025

New Models

Veo 3.1

Veo 3.1: Google's next-gen video model launches with cinematic audio

Google DeepMind shipped Veo 3.1, the next version of its video generation model with improved quality and cinematic audio. Senior PM Jessica Gallegos joined the show to discuss how the model and its product packaging (including Flow) are evolving video generation into a real user experience story.

Google Developers Blog ↗

🎙️ Hear our coverage →

OpenAI Oct 16, 2025

Major Features & Updates

Sora

Sora extends generations to 15s (25s Pro) and adds storyboards

OpenAI upgraded Sora with longer generations, up to 15 seconds for standard users and 25 seconds for Pro, plus a new storyboard feature for multi-shot control. The update keeps Sora competitive as video models race on length and controllability.

🎙️ Hear our coverage →

September 2025

Alibaba (Wan) Sep 25, 2025

New ModelsOpen weights

Wan 2.2 Animate

Wan Animate brings open-weights character animation and replacement

Alibaba's Wan team released Wan 2.2 Animate, an open-weights model that animates a character image from a performance video, replicating motion and expressions, or swaps a character into existing footage. It landed in the episode's closing run of video releases showing multimodal product quality climbing across the board.

🎙️ Hear our coverage →

#video-gen #open-source

Kling AI Sep 25, 2025

New Models

Kling 2.5 Turbo

Kling 2.5 Turbo upgrades AI video generation quality and cost

Kuaishou's Kling AI shipped Kling 2.5 Turbo, an update to its video generation model with better motion, prompt adherence, and cinematic quality at a lower price. Together with Wan Animate it was cited on the show as proof that video model quality is being turbocharged this season.

🎙️ Hear our coverage →

ByteDance / Tsinghua Sep 18, 2025

New ModelsOpen weights

HuMo

HuMo: human-centric multimodal video generation from ByteDance/Tsinghua

ByteDance research and Tsinghua released HuMo, a human-centric video generation model that conditions on multimodal inputs (text, image, and audio) to produce videos of people. The weights are available on Hugging Face.

🎙️ Hear our coverage →

#video-gen #open-source

Luma AI Sep 18, 2025

New Models

Ray3

Luma's Ray3: a 'reasoning' video model with native HDR

Luma AI launched Ray3, a video generation model it bills as a 'reasoning' video model, with native HDR output, a fast Draft Mode, and Hi-Fi mastering. It is available in Luma's Dream Machine and feeds the episode's closing theme of a next wave of video models.

X ↗Try It ↗

🎙️ Hear our coverage →

#video-gen #reasoning

July 2025

D Dynamics Lab Jul 3, 2025

Products & Apps

Mirage

Mirage debuts as the first AI-native UGC game engine

Dynamics Lab unveiled Mirage, billed as the world's first AI-native user-generated-content game engine, with real-time photorealistic playable demos powered by world-model-style generation. Alex reacted to it live as the most visibly fun demo of the week and a preview of where interactive media is headed.

Playable demo & blog ↗

🎙️ Hear our coverage →

#world-models #video-gen

May 2025

Odyssey May 29, 2025

Products & Apps

Odyssey Interactive Video

Odyssey debuts real-time interactive AI video at 30 FPS

Odyssey launched interactive video: real-time AI world exploration rendered at 30 FPS, letting you walk through generated worlds as they are created. A glimpse at world-model-driven media where the video responds to you instead of just playing back.

Blog ↗Try It ↗

🎙️ Hear our coverage →

#video-gen #world-models

Tencent (Hunyuan) May 29, 2025

New Models

HunyuanPortrait

Tencent's HunyuanPortrait animates portraits from a single photo

Tencent's Hunyuan team published HunyuanPortrait, a model for high-fidelity portrait video generation from a single photo. It animates a still portrait into realistic talking-head video, with an accompanying paper.

Site ↗Paper ↗

🎙️ Hear our coverage →

Tencent (Hunyuan) May 29, 2025

New ModelsOpen weights

HunyuanVideo-Avatar

Tencent releases HunyuanVideo-Avatar for audio-driven avatars

Tencent Hunyuan released HunyuanVideo-Avatar, an audio-driven full-body avatar animation model. Feed it audio and a reference image and it animates a full-body avatar in sync, pushing AI-generated humans further toward indistinguishable.

Site ↗Tweet ↗

🎙️ Hear our coverage →

Alibaba May 15, 2025

New ModelsOpen weights

Wan 2.1

Alibaba's Wan 2.1: open-source diffusion-transformer text-to-video suite

Alibaba, the team behind the Qwen LLMs, released Wan 2.1, a full stack of open-source diffusion-transformer text-to-video foundation models. Amid the show's discussion of video-model fatigue, this was called out as a release that cuts through the noise, with weights on Hugging Face and code on GitHub.

Hugging Face ↗GitHub ↗Announcement tweet ↗Try it ↗

🎙️ Hear our coverage →

#video-gen #open-source #architecture

Lightricks May 15, 2025

New Models

LTX Video (distilled)

LTX distilled model enables near real-time video generation

Lightricks shared a distilled version of its LTX video model that generates video at near real-time speeds. It was highlighted in the vision and video segment as a notable speed milestone for video generation.

Announcement on X ↗

🎙️ Hear our coverage →

#video-gen #voice-ai

Runway May 1, 2025

Major Features & Updates

Gen-4 References

Runway References brings character and scene consistency to Gen-4

Runway launched References for Gen-4 on all paid plans, letting creators supply reference images (characters, outfits, locations, even selfies) and use tags in prompts to keep those elements consistent across generations. It tackles AI video's biggest pain point, frame-to-frame identity drift, at no extra credit cost per run.

Runway References examples (X search) ↗

🎙️ Hear our coverage →

#video-gen #image-gen

April 2025

Character.AI Apr 24, 2025

Products & Apps

AvatarFX

Character.AI opens early access to AvatarFX talking avatars

Character.AI announced AvatarFX, now in early access, which turns static images into speaking, emoting video avatars. It targets bringing characters to life for conversational and creative use cases.

🎙️ Hear our coverage →

L Lvmin Zhang (lllyasviel) Apr 24, 2025

New ModelsOpen weights

FramePack

FramePack generates 120-second videos on just 6GB of VRAM

FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.

120s Max video length6GB Minimum VRAM

Project Page ↗GitHub ↗

🎙️ Hear our coverage →

#video-gen #open-source #on-device

Sand AI Apr 24, 2025

New ModelsOpen weights

MAGI-1

Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model

Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.

24B Parameters24 Frames per autoregressive chunk

X Post ↗GitHub ↗PDF Report ↗HF Repo ↗

🎙️ Hear our coverage →

#video-gen #open-source #architecture

ByteDance Apr 17, 2025

New Models

Seaweed-7B

ByteDance publishes Seaweed-7B video generation foundation model

ByteDance publicly presented Seaweed-7B, a 7B parameter video generation foundation model, showing competitive video quality from a comparatively small model. Details and demos were published at seaweed.video.

seaweed.video ↗

🎙️ Hear our coverage →

#video-gen #frontier-models

Google DeepMind Apr 17, 2025

Major Features & Updates

Veo 2

Veo 2 video generation hits GA in the API and Gemini App

Google made Veo 2 video generation generally available for developers and rolled it out in the Gemini App. The GA release brings Google's flagship text-to-video model out of preview and into production use.

Dev Blog ↗Try It ↗

🎙️ Hear our coverage →

Kling AI Apr 17, 2025

New Models

Kling 2.0

Kling 2.0 Creative Suite launches

Kuaishou's Kling AI launched Kling 2.0 along with a broader Creative Suite, upgrading its video generation model and tooling. The release kept up the rapid pace in the closed-source video generation race during a packed vision and video week.

🎙️ Hear our coverage →

Stanford / NVIDIA / UCSD / UC Berkeley Apr 10, 2025

Papers & ResearchOpen weights

One-Minute Video Generation with Test-Time Training

Test-Time Training paper one-shots minute-long videos with consistent characters

Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.

1 min Single-shot generated video length

Project blog ↗Paper ↗

🎙️ Hear our coverage →

#video-gen #research #training

ByteDance Apr 3, 2025

Products & Apps

OmniHuman (via Dreamina)

ByteDance's OmniHuman image-to-avatar model goes public via Dreamina

ByteDance's impressive OmniHuman model, which turns a single image plus audio into a realistic talking avatar video, became publicly usable through the Dreamina (CapCut) website. The results land squarely in uncanny-valley territory, as Alex demonstrated with his own avatar thread.

OmniHuman on Dreamina ↗Example thread by Alex ↗

🎙️ Hear our coverage →

Meta AI Apr 3, 2025

Papers & Research

MoCha

Meta's MoCha generates movie-grade talking AI characters from speech and text

Meta GenAI researchers published MoCha, a model that generates stunningly realistic, movie-grade talking characters directly from speech plus text. Co-author Cong Wei joined the show to discuss the work, which points at AI actors entering Hollywood-quality territory.

MoCha project page ↗

🎙️ Hear our coverage →

#video-gen #research

Runway Apr 3, 2025

New Models

Runway Gen-4

Runway Gen-4 announced with major gains in video consistency

Runway announced Gen-4, its next-generation video model focused on character and world consistency across shots. Example videos showed notably coherent characters and scenes, pushing AI video further toward usable filmmaking.

Introducing Runway Gen-4 ↗

🎙️ Hear our coverage →

#video-gen #image-gen

March 2025

StepFun Mar 20, 2025

New ModelsOpen weights

Step-Video-TI2V

StepFun releases Step-Video-TI2V image-to-video model

Chinese lab StepFun dropped Step-Video-TI2V, an open text/image-to-video generation model. Weights are on Hugging Face with code on GitHub, adding another open-weights option to the fast-moving video generation space.

TI2V HuggingFace Space ↗TI2V Github ↗

🎙️ Hear our coverage →

#video-gen #open-source

H HPC-AI Tech Mar 13, 2025

New ModelsOpen weights

Open-Sora 2.0

OpenSora 2.0: 11B open-source video model trained for $200K

OpenSora 2.0 is an 11B parameter open-source video generation model that claims state-of-the-art results while costing only about $200,000 to train. The team claims performance approaching OpenAI's Sora on some benchmarks, underscoring how fast open-source video generation is improving.

🎙️ Hear our coverage →

#video-gen #open-source

R Remade AI Mar 13, 2025

New ModelsOpen weights

Wan 2.1 14B I2V LoRA video effects

Remade AI releases 8 open LoRA video effects for Wan 2.1

Remade AI published eight LoRA video effects for Alibaba's Wan 2.1 14B image-to-video model, including effects like squish, inflate, deflate, and cakeify. The open release shows video effects becoming trainable and customizable via LoRAs on top of open video models.

Hugging Face collection ↗

🎙️ Hear our coverage →

#video-gen #open-source

Tencent Mar 6, 2025

New ModelsOpen weights

HunyuanVideo-I2V

Tencent releases HunyuanVideo-I2V open image-to-video model

Tencent finally shipped the long-awaited image-to-video version of HunyuanVideo, with open weights on Hugging Face and a hosted try-it experience. It lets users animate still images using one of the strongest open video generation models.

Announcement (X) ↗Hugging Face ↗Try It ↗

🎙️ Hear our coverage →

#video-gen #open-source

February 2025

Google DeepMind Feb 27, 2025

APIs & Platforms

Veo 2 (via FAL API)

Google's Veo 2 video model becomes available via FAL API

Google DeepMind's Veo 2 video generation model became accessible to developers through FAL's inference API. This was the first broadly available API access to Veo 2, letting builders generate high-quality video from text prompts without waiting on Google's own product surfaces.

🎙️ Hear our coverage →

#video-gen #api

H Hao AI Lab Feb 20, 2025

Dev ToolsOpen weights

FastVideo

Hao AI Lab's FastVideo makes HunyuanVideo 3x faster with no extra training

Hao AI Lab released FastVideo, a method that makes HunyuanVideo (HY-Video) three times faster with no additional training, using a technique called Sliding Tile Attention that outperforms even flash attention for this workload. Faster inference makes open-source video models far more practical, and it supports HY-Video LoRAs for fine-tuned applications.

🎙️ Hear our coverage →

#video-gen #infrastructure #open-source

Microsoft Feb 20, 2025

New ModelsOpen weights

MUSE (WHAM)

Microsoft MUSE generates playable game worlds from a single second of video

Microsoft's MUSE can generate minutes of playable gameplay from just a single second of video frames and controller actions, preserving screen elements like health bars and percentages. It is based on the World and Human Action Model (WHAM) architecture, trained on a billion gameplay images from Xbox, with the model released on Hugging Face.

Announcement on X ↗Hugging Face ↗

🎙️ Hear our coverage →

#world-models #video-gen

StepFun Feb 20, 2025

New ModelsOpen weights

Step-Video-T2V

StepFun open-sources Step-Video-T2V, a SOTA 30B text-to-video model

StepFun released Step-Video-T2V (plus a T2V Turbo variant), a 30 billion parameter state-of-the-art text-to-video model under an MIT license. Results impressed especially on text integration, such as rendering 'We will open source' on a scroll as a character unfurls it, marking one of the strongest open-source video drops of the week.

Paper ↗Hugging Face ↗GitHub ↗Try it ↗

🎙️ Hear our coverage →

#video-gen #open-source

January 2025

Alibaba (Qwen) Jan 30, 2025

New Models

Qwen2.5-Max

Alibaba launches Qwen2.5-Max flagship model with hidden video gen

Alibaba's Qwen team released Qwen2.5-Max, a large MoE flagship model available through the Qwen Chat interface and API, claiming competitive results against DeepSeek V3 and other frontier models. The chat app also quietly shipped a video generation capability powered by Alibaba's Tongyi Wanxiang.

X announcement ↗Try it (Qwen Chat) ↗Tongyi Wanxiang ↗

🎙️ Hear our coverage →

#frontier-models #video-gen