Everything AI Released in April 2025

64 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

🧠 New Models 31

Daily (Pipecat)
New ModelsOpen weights

Smart-Turn VAD

Pipecat releases Smart-Turn, an open source semantic VAD model

The Pipecat team (from Daily) released Smart-Turn, an open source semantic voice activity detection model that understands when a speaker has actually finished their turn rather than just detecting silence. Kwindla Kramer joined the show to break down how semantic VAD makes voice agent conversations feel far more natural, with a community training effort at turn-training.pipecat.ai.

Google DeepMind
New ModelsOpen weights

Gemma 3 QAT

Google ships Quantization-Aware Trained Gemma 3 models for consumer GPUs

Google released Quantization-Aware Training (QAT) versions of the Gemma 3 family, dramatically cutting memory requirements while preserving quality. The 27B model drops from a hefty 54GB to just 14.1GB, and even the 1B model goes from 2GB to about half a gig, making state-of-the-art open models runnable on consumer GPUs. Wolfram took the 4B QAT model for a spin in LM Studio on the show.

27B Gemma 3 27B QAT: 54GB down to 14.1GB1B Gemma 3 1B QAT: 2GB down to ~0.5GB4B 4B QAT model tested in LM Studio
Lvmin Zhang (lllyasviel)
New ModelsOpen weights

FramePack

FramePack generates 120-second videos on just 6GB of VRAM

FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.

120s Max video length6GB Minimum VRAM
Nari Labs
New ModelsOpen weights

Dia-1.6B

Nari Labs' Dia: a wild 1.6B open source TTS model that blew up Twitter

Nari Labs released Dia, a 1.6B parameter open-weights text-to-speech model that absolutely blew up Twitter with its expressive, emotional dialogue generation, including laughs, coughs, and multi-speaker conversations. Built by a tiny team, it punches far above its weight against commercial TTS systems and supports voice cloning, with demos available on Fal.ai.

1.6B Parameters
NVIDIA
New ModelsOpen weights

Describe Anything (DAM-3B)

NVIDIA releases DAM-3B for region-based image and video captioning

NVIDIA dropped the Describe Anything Model (DAM-3B), a 3 billion parameter multimodal model for region-based image and video captioning. You can point it at a specific region of an image or video and it generates a detailed description of just that area. NVIDIA also published an accompanying DescribeAnything dataset and a Hugging Face demo.

3B Parameters
Sand AI
New ModelsOpen weights

MAGI-1

Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model

Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.

24B Parameters24 Frames per autoregressive chunk
Tencent
New Models

Hunyuan 3D 2.5

Tencent's Hunyuan 3D 2.5 jumps to 10B params with PBR textures and rigging

Tencent updated its 3D generation model to Hunyuan 3D 2.5, now boasting 10 billion parameters, up from 1B. They highlight massive leaps in precision with 1024-resolution geometry, high-quality textures with PBR support, and improved skeletal rigging for animation.

10B Parameters (up from 1B)1024 Geometry resolution
Google
New Models

DolphinGemma

DolphinGemma: Google's audio model for decoding dolphin communication

Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.

Kling AI
New Models

Kling 2.0

Kling 2.0 Creative Suite launches

Kuaishou's Kling AI launched Kling 2.0 along with a broader Creative Suite, upgrading its video generation model and tooling. The release kept up the rapid pace in the closed-source video generation race during a packed vision and video week.

Microsoft
New ModelsOpen weights

BitNet b1.58

Microsoft releases BitNet 1.58-bit model weights on Hugging Face

Microsoft published BitNet (listed in the show notes as BitNet v1.5), its native 1.58-bit quantized LLM, as open weights on Hugging Face. The ternary-weight approach targets extremely efficient CPU inference at a fraction of the memory of standard models.

OpenAI
New Models

o3 & o4-mini

OpenAI launches o3 and o4-mini, SOTA reasoning models with tool use

OpenAI shipped o3 and o4-mini in ChatGPT and the API, with o3 setting new SOTA records on Codeforces, SWE-bench, MMMU and more. For the first time the models can use tools (web search, Python, image generation) during the reasoning process, and they can think visually by cropping, zooming and rotating images. o3 scored $65k on the Freelancer eval versus o1's $28k, and o4-mini hits 99.5% on AIME with a Python interpreter.

$65 o3 score on the Freelancer eval ($65k vs o1's $28k)99.5% o4-mini on AIME with Python interpreter200 context window (200k tokens)
Prime Intellect
New ModelsOpen weights

INTELLECT-2

Prime Intellect launches INTELLECT-2, a 32B globally-distributed RL run

Prime Intellect released INTELLECT-2, a 32B reasoning model trained with globally decentralized reinforcement learning, a follow-up to the INTELLECT-1 decentralized pretraining run covered on the show in December. The release includes open weights on Hugging Face, a tech report, and the PRIME-RL training code.

Amazon
New Models

Nova Sonic

Amazon unveils Nova Sonic, a speech-to-speech foundation model

Amazon announced Nova Sonic, a foundational speech-to-speech model that unifies speech understanding and generation for real-time, natural-sounding voice conversations. It is available through Amazon Bedrock as part of the Nova family.

Deep Cogito
New ModelsOpen weights

Cogito v1 Preview (3B-70B)

Deep Cogito debuts Cogito v1 Preview models from 3B to 70B, beating DeepSeek 70B

New lab Deep Cogito released the Cogito v1 Preview family of open models ranging from 3B to 70B parameters, claiming SOTA results at each size and beating DeepSeek's 70B distill. The models are available on Hugging Face, giving local AI enthusiasts the small-to-mid sizes Llama 4 skipped.

3B-70B Model size range
Meta AI
New ModelsOpen weights

Llama 4 (Scout & Maverick)

Meta drops Llama 4 Scout (109B) and Maverick (400B) open-weights MoE models

Meta released the long-awaited Llama 4 family in a chaotic Saturday drop: Scout (17B active / ~109B total, 16 experts) and Maverick (17B active / ~400B total, 128 experts), with a 2T-parameter Behemoth still in training. The models are multimodal, multilingual MoE architectures trained on ~30T tokens with FP8 and interleaved attention (iRoPE), claiming 10M context for Scout and 1M for Maverick. The release was marred by drama: the LMArena version differed from the released model, and the community criticized the lack of small local-friendly sizes.

10M Stated context window for Llama 4 Scout288B Active parameters of unreleased Behemoth (2T total)17B Active parameters for both Scout and Maverick
Moonshot AI (Kimi)
New ModelsOpen weights

Kimi-VL & Kimi-VL-Thinking

Moonshot drops Kimi-VL and Kimi-VL-Thinking, tiny A3B open vision models

Moonshot AI released Kimi-VL and Kimi-VL-Thinking, compact vision-language models with only ~3B active parameters (A3B MoE). The thinking variant adds reasoning to a tiny VLM, and both are available openly on Hugging Face.

A3B ~3B active parameters (MoE)
NVIDIA
New ModelsOpen weights

Llama-3.1-Nemotron-Ultra-253B

NVIDIA ships Nemotron Ultra, a 253B pruned and distilled Llama 3.1-405B

NVIDIA released Nemotron Ultra, a pruned and distilled finetune of Llama 3.1-405B at roughly half the parameters (253B). Its benchmarks even included Llama 4 comparisons, showing the older finetuned Llama beating the new models on AIME, GPQA and more. It supports 128K context and fits on a single 8xH100 node for inference.

253B Parameters (pruned from Llama 3.1-405B)128K Context window
New ModelsOpen weights

DeepCoder-14B-Preview

DeepCoder-14B: open RL-finetuned coder beats DeepSeek R1 and o3-mini on coding

Together AI and Agentica (UC Berkeley Sky Computing Lab) released DeepCoder-14B-Preview, a reasoning model finetuned with RL that beats DeepSeek R1 and even o3-mini on several coding benchmarks. The project aims to democratize RL: the team open-sourced the model, the training dataset, the Weights & Biases logs, and the eval logs. Guest Michael Luo from Agentica joined the show to discuss the release.

14B Model parameters
All Hands AI
New ModelsOpen weights

OpenHands LM 32B

OpenHands LM 32B: MIT-licensed coding agent model hits 37.2% SWE-Bench

All Hands AI (formerly OpenDevin) released OpenHands LM 32B, an MIT-licensed Qwen finetune that scores 37.2% on SWE-Bench Verified, competing with much larger models on real-world repo tasks. The OpenHands agent also took the #2 spot on the new Live SWE-Bench leaderboard, and the 32B model runs locally on a single RTX 3090. A hosted OpenHands Cloud version is also available; guest Xingyao Wang joined the show to discuss it.

37.2% SWE-Bench Verified score#2 Live SWE-Bench leaderboard (OpenHands agent)
Gladia
New Models

Solaria STT

Gladia launches Solaria speech-to-text model

Gladia launched Solaria, a new speech-to-text model offered through its transcription platform. It arrived in a busy week for voice AI alongside Hailuo's Speech-02 TTS.

New Models

Dream 7B

Dream 7B: a diffusion language model challenger unveiled

Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.

Nomic AI
New ModelsOpen weights

Nomic Embed Multimodal

Nomic Embed Multimodal: SOTA embeddings for visual documents

Nomic AI released Nomic Embed Multimodal, new 3B and 7B parameter embedding models built on Alibaba's Qwen2.5-VL. They achieve SOTA on visual document retrieval by embedding interleaved text-image sequences, ideal for PDFs and complex webpages. The 7B model ships under Apache 2.0 with open weights, code, and data; guest Zach Nussbaum discussed the release on the show.

3B parameters (smaller model)7B parameters (Apache 2.0 model)

🚀 Products & Apps 7

Character.AI
Products & Apps

AvatarFX

Character.AI opens early access to AvatarFX talking avatars

Character.AI announced AvatarFX, now in early access, which turns static images into speaking, emoting video avatars. It targets bringing characters to life for conversational and creative use cases.

Mistral AI
Products & Apps

Classifiers Factory

Mistral releases Classifiers Factory

Mistral announced Classifiers Factory, a service for building and training custom text classifiers on its platform. Covered as a quick item in the Big CO LLMs + APIs section of the show.

Anthropic
Products & Apps

Claude Max plan

Anthropic launches Max plan at $200/mo with higher usage quotas

Anthropic introduced a new Max subscription tier priced at $200 per month, offering significantly more usage quota than the standard Pro plan. It mirrors OpenAI's Pro-tier pricing strategy for power users.

$200/mo Max plan price
Amazon
Products & Apps

Nova Act

Amazon announces Nova Act browser agent SDK

Amazon entered the agent race with Nova Act, an agent designed to take actions in web browsers, possibly built with talent from the Adept acquisition. Amazon claims it beats Claude 3.5 and OpenAI's computer-use model on some benchmarks, but it is only available via an SDK behind a request form, so claims could not be verified hands-on.

ByteDance
Products & Apps

OmniHuman (via Dreamina)

ByteDance's OmniHuman image-to-avatar model goes public via Dreamina

ByteDance's impressive OmniHuman model, which turns a single image plus audio into a realistic talking avatar video, became publicly usable through the Dreamina (CapCut) website. The results land squarely in uncanny-valley territory, as Alex demonstrated with his own avatar thread.

Cognition Labs
Products & Apps

Devin 2.0

Devin 2.0 launches with new IDE experience and $20/month entry price

Breaking during the show: Cognition Labs launched Devin 2.0, the second version of its AI software engineer, with a new IDE experience. Crucially, pricing now starts at $20/month, down from the original $500/month tier, making the agent far more accessible.

$20/mo new starting price

✨ Major Features & Updates 8

Anthropic
Major Features & Updates

Claude Research

Claude gains Research mode and Google Workspace integration

Anthropic shipped a Research capability for Claude, letting it conduct multi-step research across the web, alongside a Google Workspace integration that connects Claude to email, calendar and docs context.

Google DeepMind
Major Features & Updates

Veo 2

Veo 2 video generation hits GA in the API and Gemini App

Google made Veo 2 video generation generally available for developers and rolled it out in the Gemini App. The GA release brings Google's flagship text-to-video model out of preview and into production use.

Weights & Biases
Major Features & Updates

W&B Weave Playground

W&B Weave Playground adds GPT-4.1 family and o3/o4-mini support

The Weights & Biases Weave Playground shipped full support for the new GPT-4.1 family and the o3/o4-mini models, letting developers evaluate and compare the week's new models for their own applications.

Google DeepMind
Major Features & Updates

Official MCP support

Google announces official support for the Model Context Protocol (MCP)

Demis Hassabis announced that Google will officially support Anthropic's Model Context Protocol (MCP) in its models and SDKs. This was a major signal of MCP becoming the industry standard for connecting AI models to tools and data.

OpenAI
Major Features & Updates

ChatGPT enhanced memory

OpenAI gives ChatGPT enhanced memory that can recall all your past chats

OpenAI rolled out enhanced memory for ChatGPT, allowing it to reference and recall all of a user's previous conversations rather than just saved memories. This makes ChatGPT significantly more personalized across sessions.

Windsurf
Major Features & Updates

Windsurf Netlify deployments

Windsurf adds one-click deployments to Netlify

Windsurf shipped a deployments feature that lets users push apps straight to Netlify from the editor. A small but practical step toward end-to-end app building inside AI coding tools.

🔌 APIs & Platforms 3

OpenAI
APIs & Platforms

gpt-image-1

OpenAI's GPT Image generation lands in the API as gpt-image-1

OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available to developers via API under the official name gpt-image-1. This was the big one many developers were waiting for, opening up the viral image generation and editing capabilities for building AI art and image editing applications.

🛠️ Dev Tools 4

HumanLayer
Dev ToolsOpen weights

12-Factor Agents

Dex Horthy publishes 12-Factor Agents, a guide to production-ready agents

HumanLayer founder Dex Horthy published 12-Factor Agents, an open GitHub repo and essay distilling common patterns and pitfalls for building reliable, production-ready AI agents. Drawing on his experience building agent SDKs, it argues that serious teams end up writing large parts from scratch and lays out principles for robust agent design, discussed in depth on the show.

OpenAI
Dev ToolsOpen weights

Codex CLI

OpenAI debuts Codex CLI, an open source terminal coding agent

OpenAI released Codex CLI, an open source coding tool for the terminal. It ships with hardened security, using Apple Seatbelt on macOS to limit execution to the current directory plus temp files.

Cloudflare
Dev ToolsOpen weights

Agents SDK

Cloudflare releases a new Agents SDK for building stateful AI agents

Cloudflare shipped a new Agents SDK for building and deploying AI agents on its edge platform. It joins the week's wave of agent infrastructure announcements alongside Google's A2A and broad MCP adoption.

📄 Papers & Research 3

Papers & ResearchOpen weights

One-Minute Video Generation with Test-Time Training

Test-Time Training paper one-shots minute-long videos with consistent characters

Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.

1 min Single-shot generated video length
Meta AI
Papers & Research

MoCha

Meta's MoCha generates movie-grade talking AI characters from speech and text

Meta GenAI researchers published MoCha, a model that generates stunningly realistic, movie-grade talking characters directly from speech plus text. Co-author Cong Wei joined the show to discuss the work, which points at AI actors entering Hollywood-quality territory.

📊 Benchmarks & Evals 4

CoreWeave
Benchmarks & Evals

CoreWeave GB200 inference benchmark

CoreWeave hits 800 tok/s on Llama 405B with NVIDIA GB200 Blackwell

CoreWeave announced record-breaking AI inference benchmarks using NVIDIA's new GB200 Grace Blackwell superchips: 800 tokens/sec on Llama 3.1 405B, plus 33,000 tokens/sec on Llama 2 70B with H200s. It is a marker of how fast inference hardware is accelerating.

800 tok/s Llama 3.1 405B on GB20033,000 tok/s Llama 2 70B on H200
Google DeepMind
Benchmarks & Evals

Gemini 2.5 Pro USAMO results

Gemini 2.5 Pro scores 24.4% on USAMO olympiad math, crushing the field

New evaluation results published this week showed Gemini 2.5 Pro scoring 24.4% on the USA Math Olympiad (USAMO), problems so hard that most top models score under 5%. The result showcases a step change in frontier reasoning ability on competition mathematics.

24.4% Gemini 2.5 Pro USAMO score<5% typical score for other top models
OpenAI
Benchmarks & EvalsOpen weights

PaperBench

OpenAI releases PaperBench eval and open-sources Nano-Eval framework

OpenAI published PaperBench, a tough new evaluation that tests whether AI agents can replicate cutting-edge AI research papers, with more than 8,300 graded tasks and meta-evaluation of the LLM judge. The best model managed only a 21.0% replication score versus 41.4% for human PhDs. The code and the Nano-Eval framework were open sourced on GitHub alongside the paper.

8,300+ graded tasks in the benchmark21.0% best model replication score41.4% human PhD baseline score

💰 Funding 1

OpenAI
Funding

OpenAI $40B funding round

OpenAI raises $40B at a $300B valuation

OpenAI closed a $40 billion funding round at a $300 billion valuation, one of the largest private raises ever. The show noted the raise rode the wave of native image generation in ChatGPT, with especially strong growth in India.

$40B capital raised$300B post-money valuation

🌀 Also Released 3

Google
Also ReleasedOpen weights

Agent2Agent (A2A) protocol

Google announces A2A, an open agent-to-agent communication protocol

Google announced the Agent2Agent (A2A) protocol at Cloud Next, an open spec for agents from different vendors to discover and communicate with each other. The spec was published on GitHub with a long list of launch partners, including Weights & Biases.

Weights & Biases
Also Released

observable.tools & MCP RFC-269

W&B launches observable.tools initiative and MCP observability RFC

Weights & Biases launched the observable.tools initiative and published an RFC (RFC-269) proposing observability standards for the Model Context Protocol, inviting community comment. W&B also announced it is a launch partner for Google's A2A protocol.

Weights & Biases
Also ReleasedOpen weights

Observable Tools

W&B launches Observable.tools initiative to add observability to MCP

Alex and Weights & Biases launched the Observable Tools initiative to bring observability to the Model Context Protocol (MCP) ecosystem, since external tool calls currently lose visibility for debugging and security. A concrete proposal using OpenTelemetry was posted to the MCP specification GitHub discussions for community feedback.