New ModelsOpen weights
Smart-Turn VAD
Pipecat releases Smart-Turn, an open source semantic VAD model
The Pipecat team (from Daily) released Smart-Turn, an open source semantic voice activity detection model that understands when a speaker has actually finished their turn rather than just detecting silence. Kwindla Kramer joined the show to break down how semantic VAD makes voice agent conversations feel far more natural, with a community training effort at turn-training.pipecat.ai.
New ModelsOpen weights
Gemma 3 QAT
Google ships Quantization-Aware Trained Gemma 3 models for consumer GPUs
Google released Quantization-Aware Training (QAT) versions of the Gemma 3 family, dramatically cutting memory requirements while preserving quality. The 27B model drops from a hefty 54GB to just 14.1GB, and even the 1B model goes from 2GB to about half a gig, making state-of-the-art open models runnable on consumer GPUs. Wolfram took the 4B QAT model for a spin in LM Studio on the show.
27B Gemma 3 27B QAT: 54GB down to 14.1GB1B Gemma 3 1B QAT: 2GB down to ~0.5GB4B 4B QAT model tested in LM Studio
New ModelsOpen weights
FramePack
FramePack generates 120-second videos on just 6GB of VRAM
FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.
120s Max video length6GB Minimum VRAM
New ModelsOpen weights
Dia-1.6B
Nari Labs' Dia: a wild 1.6B open source TTS model that blew up Twitter
Nari Labs released Dia, a 1.6B parameter open-weights text-to-speech model that absolutely blew up Twitter with its expressive, emotional dialogue generation, including laughs, coughs, and multi-speaker conversations. Built by a tiny team, it punches far above its weight against commercial TTS systems and supports voice cloning, with demos available on Fal.ai.
1.6B Parameters
New ModelsOpen weights
Describe Anything (DAM-3B)
NVIDIA releases DAM-3B for region-based image and video captioning
NVIDIA dropped the Describe Anything Model (DAM-3B), a 3 billion parameter multimodal model for region-based image and video captioning. You can point it at a specific region of an image or video and it generates a detailed description of just that area. NVIDIA also published an accompanying DescribeAnything dataset and a Hugging Face demo.
3B Parameters
New ModelsOpen weights
MAGI-1
Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model
Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.
24B Parameters24 Frames per autoregressive chunk
New Models
Hunyuan 3D 2.5
Tencent's Hunyuan 3D 2.5 jumps to 10B params with PBR textures and rigging
Tencent updated its 3D generation model to Hunyuan 3D 2.5, now boasting 10 billion parameters, up from 1B. They highlight massive leaps in precision with 1024-resolution geometry, high-quality textures with PBR support, and improved skeletal rigging for animation.
10B Parameters (up from 1B)1024 Geometry resolution
New Models
Seaweed-7B
ByteDance publishes Seaweed-7B video generation foundation model
ByteDance publicly presented Seaweed-7B, a 7B parameter video generation foundation model, showing competitive video quality from a comparatively small model. Details and demos were published at seaweed.video.
New Models
Seedream 3.0
ByteDance Seedream 3.0: bilingual 2K text-to-image model
ByteDance's Seed team announced Seedream 3.0, a powerful bilingual (Chinese/English) text-to-image model that generates native 2048x2048 images with fast inference of around 3 seconds for a 1K image on an A100. It challenges the top closed image generation models.
New Models
Embed 4
Cohere Embed 4: multimodal embeddings for enterprise search
Cohere released Embed 4, a multimodal embedding model aimed at enterprise search and retrieval over mixed text and image documents. It is available through Cohere's API.
New Models
DolphinGemma
DolphinGemma: Google's audio model for decoding dolphin communication
Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.
New Models
Gemini 2.5 Flash
Google launches Gemini 2.5 Flash with controllable thinking budgets
Google answered OpenAI's launch week with Gemini 2.5 Flash, a fast reasoning model that introduces controllable thinking budgets so developers can dial how much the model reasons per request. It is available through the Gemini API and developer platform.
New Models
Kling 2.0
Kling 2.0 Creative Suite launches
Kuaishou's Kling AI launched Kling 2.0 along with a broader Creative Suite, upgrading its video generation model and tooling. The release kept up the rapid pace in the closed-source video generation race during a packed vision and video week.
New ModelsOpen weights
BitNet b1.58
Microsoft releases BitNet 1.58-bit model weights on Hugging Face
Microsoft published BitNet (listed in the show notes as BitNet v1.5), its native 1.58-bit quantized LLM, as open weights on Hugging Face. The ternary-weight approach targets extremely efficient CPU inference at a fraction of the memory of standard models.
New Models
GPT-4.1, 4.1-mini, 4.1-nano
OpenAI launches GPT-4.1 family (4.1, mini, nano) in the API
OpenAI released the GPT-4.1 family of models, available via API only, in three sizes: 4.1, 4.1-mini and 4.1-nano. The family features a 1M token context window, in contrast to o3's 200k, and is aimed at developers building on long-context and coding workloads.
New Models
o3 & o4-mini
OpenAI launches o3 and o4-mini, SOTA reasoning models with tool use
OpenAI shipped o3 and o4-mini in ChatGPT and the API, with o3 setting new SOTA records on Codeforces, SWE-bench, MMMU and more. For the first time the models can use tools (web search, Python, image generation) during the reasoning process, and they can think visually by cropping, zooming and rotating images. o3 scored $65k on the Freelancer eval versus o1's $28k, and o4-mini hits 99.5% on AIME with a Python interpreter.
$65 o3 score on the Freelancer eval ($65k vs o1's $28k)99.5% o4-mini on AIME with Python interpreter200 context window (200k tokens)
New ModelsOpen weights
INTELLECT-2
Prime Intellect launches INTELLECT-2, a 32B globally-distributed RL run
Prime Intellect released INTELLECT-2, a 32B reasoning model trained with globally decentralized reinforcement learning, a follow-up to the INTELLECT-1 decentralized pretraining run covered on the show in December. The release includes open weights on Hugging Face, a tech report, and the PRIME-RL training code.
New ModelsOpen weights
GLM-4-0414
Z.ai (formerly chatGLM) releases the GLM-4-0414 open-source family
Z.ai, the rebranded Zhipu AI / chatGLM team, released the GLM-4-0414 family of open-source models. The drop includes base, reasoning and rumination variants published on Hugging Face and GitHub.
New Models
Nova Sonic
Amazon unveils Nova Sonic, a speech-to-speech foundation model
Amazon announced Nova Sonic, a foundational speech-to-speech model that unifies speech understanding and generation for real-time, natural-sounding voice conversations. It is available through Amazon Bedrock as part of the Nova family.
New ModelsOpen weights
Cogito v1 Preview (3B-70B)
Deep Cogito debuts Cogito v1 Preview models from 3B to 70B, beating DeepSeek 70B
New lab Deep Cogito released the Cogito v1 Preview family of open models ranging from 3B to 70B parameters, claiming SOTA results at each size and beating DeepSeek's 70B distill. The models are available on Hugging Face, giving local AI enthusiasts the small-to-mid sizes Llama 4 skipped.
3B-70B Model size range
New ModelsOpen weights
HiDream-I1-Dev
HiDream-I1-Dev: 17B MIT-licensed image model surpasses Flux 1.1 [pro]
HiDream released HiDream-I1-Dev, a 17B parameter open-weights image generation model under an MIT license. It became the new leading open-weights image generator, surpassing Flux 1.1 [pro] on quality benchmarks.
17B Parameters, MIT license
New ModelsOpen weights
Jina Reranker M0
Jina Reranker M0: SOTA multilingual, multimodal document reranker
Jina AI released Jina Reranker M0, a state-of-the-art multimodal and multilingual document reranker model. It reranks documents that include both text and images, targeting retrieval and RAG pipelines, with weights available on Hugging Face.
New ModelsOpen weights
Llama 4 (Scout & Maverick)
Meta drops Llama 4 Scout (109B) and Maverick (400B) open-weights MoE models
Meta released the long-awaited Llama 4 family in a chaotic Saturday drop: Scout (17B active / ~109B total, 16 experts) and Maverick (17B active / ~400B total, 128 experts), with a 2T-parameter Behemoth still in training. The models are multimodal, multilingual MoE architectures trained on ~30T tokens with FP8 and interleaved attention (iRoPE), claiming 10M context for Scout and 1M for Maverick. The release was marred by drama: the LMArena version differed from the released model, and the community criticized the lack of small local-friendly sizes.
10M Stated context window for Llama 4 Scout288B Active parameters of unreleased Behemoth (2T total)17B Active parameters for both Scout and Maverick
New ModelsOpen weights
Kimi-VL & Kimi-VL-Thinking
Moonshot drops Kimi-VL and Kimi-VL-Thinking, tiny A3B open vision models
Moonshot AI released Kimi-VL and Kimi-VL-Thinking, compact vision-language models with only ~3B active parameters (A3B MoE). The thinking variant adds reasoning to a tiny VLM, and both are available openly on Hugging Face.
A3B ~3B active parameters (MoE)
New ModelsOpen weights
Llama-3.1-Nemotron-Ultra-253B
NVIDIA ships Nemotron Ultra, a 253B pruned and distilled Llama 3.1-405B
NVIDIA released Nemotron Ultra, a pruned and distilled finetune of Llama 3.1-405B at roughly half the parameters (253B). Its benchmarks even included Llama 4 comparisons, showing the older finetuned Llama beating the new models on AIME, GPQA and more. It supports 128K context and fits on a single 8xH100 node for inference.
253B Parameters (pruned from Llama 3.1-405B)128K Context window
New ModelsOpen weights
DeepCoder-14B-Preview
DeepCoder-14B: open RL-finetuned coder beats DeepSeek R1 and o3-mini on coding
Together AI and Agentica (UC Berkeley Sky Computing Lab) released DeepCoder-14B-Preview, a reasoning model finetuned with RL that beats DeepSeek R1 and even o3-mini on several coding benchmarks. The project aims to democratize RL: the team open-sourced the model, the training dataset, the Weights & Biases logs, and the eval logs. Guest Michael Luo from Agentica joined the show to discuss the release.
14B Model parameters
New ModelsOpen weights
OpenHands LM 32B
OpenHands LM 32B: MIT-licensed coding agent model hits 37.2% SWE-Bench
All Hands AI (formerly OpenDevin) released OpenHands LM 32B, an MIT-licensed Qwen finetune that scores 37.2% on SWE-Bench Verified, competing with much larger models on real-world repo tasks. The OpenHands agent also took the #2 spot on the new Live SWE-Bench leaderboard, and the 32B model runs locally on a single RTX 3090. A hosted OpenHands Cloud version is also available; guest Xingyao Wang joined the show to discuss it.
37.2% SWE-Bench Verified score#2 Live SWE-Bench leaderboard (OpenHands agent)
New Models
Solaria STT
Gladia launches Solaria speech-to-text model
Gladia launched Solaria, a new speech-to-text model offered through its transcription platform. It arrived in a busy week for voice AI alongside Hailuo's Speech-02 TTS.
New Models
Dream 7B
Dream 7B: a diffusion language model challenger unveiled
Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.
New ModelsOpen weights
Nomic Embed Multimodal
Nomic Embed Multimodal: SOTA embeddings for visual documents
Nomic AI released Nomic Embed Multimodal, new 3B and 7B parameter embedding models built on Alibaba's Qwen2.5-VL. They achieve SOTA on visual document retrieval by embedding interleaved text-image sequences, ideal for PDFs and complex webpages. The 7B model ships under Apache 2.0 with open weights, code, and data; guest Zach Nussbaum discussed the release on the show.
3B parameters (smaller model)7B parameters (Apache 2.0 model)
New Models
Runway Gen-4
Runway Gen-4 announced with major gains in video consistency
Runway announced Gen-4, its next-generation video model focused on character and world consistency across shots. Example videos showed notably coherent characters and scenes, pushing AI video further toward usable filmmaking.