Everything AI Released in April 2025

64 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

← March 2025 All months May 2025 →

🧠 New Models 31

Daily (Pipecat) Apr 24, 2025

New ModelsOpen weights

Smart-Turn VAD

Pipecat releases Smart-Turn, an open source semantic VAD model

The Pipecat team (from Daily) released Smart-Turn, an open source semantic voice activity detection model that understands when a speaker has actually finished their turn rather than just detecting silence. Kwindla Kramer joined the show to break down how semantic VAD makes voice agent conversations feel far more natural, with a community training effort at turn-training.pipecat.ai.

GitHub ↗HF Model ↗Fal.ai Playground ↗Try It Demo ↗

🎙️ Hear our coverage →

#voice-ai #open-source #agents

Google DeepMind Apr 24, 2025

New ModelsOpen weights

Gemma 3 QAT

Google ships Quantization-Aware Trained Gemma 3 models for consumer GPUs

Google released Quantization-Aware Training (QAT) versions of the Gemma 3 family, dramatically cutting memory requirements while preserving quality. The 27B model drops from a hefty 54GB to just 14.1GB, and even the 1B model goes from 2GB to about half a gig, making state-of-the-art open models runnable on consumer GPUs. Wolfram took the 4B QAT model for a spin in LM Studio on the show.

27B Gemma 3 27B QAT: 54GB down to 14.1GB1B Gemma 3 1B QAT: 2GB down to ~0.5GB4B 4B QAT model tested in LM Studio

X Post ↗Blog ↗Reddit thread ↗

🎙️ Hear our coverage →

#open-source #infrastructure #on-device

L Lvmin Zhang (lllyasviel) Apr 24, 2025

New ModelsOpen weights

FramePack

FramePack generates 120-second videos on just 6GB of VRAM

FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.

120s Max video length6GB Minimum VRAM

Project Page ↗GitHub ↗

🎙️ Hear our coverage →

#video-gen #open-source #on-device

Nari Labs Apr 24, 2025

New ModelsOpen weights

Dia-1.6B

Nari Labs' Dia: a wild 1.6B open source TTS model that blew up Twitter

Nari Labs released Dia, a 1.6B parameter open-weights text-to-speech model that absolutely blew up Twitter with its expressive, emotional dialogue generation, including laughs, coughs, and multi-speaker conversations. Built by a tiny team, it punches far above its weight against commercial TTS systems and supports voice cloning, with demos available on Fal.ai.

1.6B Parameters

X Post Highlight ↗HF Model ↗GitHub ↗Fal.ai Voice Clone Demo ↗

🎙️ Hear our coverage →

#voice-ai #open-source

NVIDIA Apr 24, 2025

New ModelsOpen weights

Describe Anything (DAM-3B)

NVIDIA releases DAM-3B for region-based image and video captioning

NVIDIA dropped the Describe Anything Model (DAM-3B), a 3 billion parameter multimodal model for region-based image and video captioning. You can point it at a specific region of an image or video and it generates a detailed description of just that area. NVIDIA also published an accompanying DescribeAnything dataset and a Hugging Face demo.

3B Parameters

X Post ↗HF Model ↗HF Demo ↗HF Dataset ↗

🎙️ Hear our coverage →

#vision #multimodal #open-source

Sand AI Apr 24, 2025

New ModelsOpen weights

MAGI-1

Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model

Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.

24B Parameters24 Frames per autoregressive chunk

X Post ↗GitHub ↗PDF Report ↗HF Repo ↗

🎙️ Hear our coverage →

#video-gen #open-source #architecture

Tencent Apr 24, 2025

New Models

Hunyuan 3D 2.5

Tencent's Hunyuan 3D 2.5 jumps to 10B params with PBR textures and rigging

Tencent updated its 3D generation model to Hunyuan 3D 2.5, now boasting 10 billion parameters, up from 1B. They highlight massive leaps in precision with 1024-resolution geometry, high-quality textures with PBR support, and improved skeletal rigging for animation.

10B Parameters (up from 1B)1024 Geometry resolution

🎙️ Hear our coverage →

#world-models #image-gen

ByteDance Apr 17, 2025

New Models

Seaweed-7B

ByteDance publishes Seaweed-7B video generation foundation model

ByteDance publicly presented Seaweed-7B, a 7B parameter video generation foundation model, showing competitive video quality from a comparatively small model. Details and demos were published at seaweed.video.

seaweed.video ↗

🎙️ Hear our coverage →

#video-gen #frontier-models

ByteDance Apr 17, 2025

New Models

Seedream 3.0

ByteDance Seedream 3.0: bilingual 2K text-to-image model

ByteDance's Seed team announced Seedream 3.0, a powerful bilingual (Chinese/English) text-to-image model that generates native 2048x2048 images with fast inference of around 3 seconds for a 1K image on an A100. It challenges the top closed image generation models.

Tech post ↗arXiv ↗AIbase news ↗

🎙️ Hear our coverage →

#image-gen #architecture

Cohere Apr 17, 2025

New Models

Embed 4

Cohere Embed 4: multimodal embeddings for enterprise search

Cohere released Embed 4, a multimodal embedding model aimed at enterprise search and retrieval over mixed text and image documents. It is available through Cohere's API.

Blog ↗Docs Changelog ↗X ↗

🎙️ Hear our coverage →

Google Apr 17, 2025

New Models

DolphinGemma

DolphinGemma: Google's audio model for decoding dolphin communication

Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.

🎙️ Hear our coverage →

#audio #research

Google DeepMind Apr 17, 2025

New Models

Gemini 2.5 Flash

Google launches Gemini 2.5 Flash with controllable thinking budgets

Google answered OpenAI's launch week with Gemini 2.5 Flash, a fast reasoning model that introduces controllable thinking budgets so developers can dial how much the model reasons per request. It is available through the Gemini API and developer platform.

Blog Post ↗API Docs ↗

🎙️ Hear our coverage (+1 follow-up) →

#reasoning #frontier-models #api

Kling AI Apr 17, 2025

New Models

Kling 2.0

Kling 2.0 Creative Suite launches

Kuaishou's Kling AI launched Kling 2.0 along with a broader Creative Suite, upgrading its video generation model and tooling. The release kept up the rapid pace in the closed-source video generation race during a packed vision and video week.

🎙️ Hear our coverage →

Microsoft Apr 17, 2025

New ModelsOpen weights

BitNet b1.58

Microsoft releases BitNet 1.58-bit model weights on Hugging Face

Microsoft published BitNet (listed in the show notes as BitNet v1.5), its native 1.58-bit quantized LLM, as open weights on Hugging Face. The ternary-weight approach targets extremely efficient CPU inference at a fraction of the memory of standard models.

Hugging Face ↗

🎙️ Hear our coverage →

#open-source #infrastructure

OpenAI Apr 17, 2025

New Models

GPT-4.1, 4.1-mini, 4.1-nano

OpenAI launches GPT-4.1 family (4.1, mini, nano) in the API

OpenAI released the GPT-4.1 family of models, available via API only, in three sizes: 4.1, 4.1-mini and 4.1-nano. The family features a 1M token context window, in contrast to o3's 200k, and is aimed at developers building on long-context and coding workloads.

Our Coverage ↗Prompting guide ↗

🎙️ Hear our coverage →

#frontier-models #architecture #coding

OpenAI Apr 17, 2025

New Models

o3 & o4-mini

OpenAI launches o3 and o4-mini, SOTA reasoning models with tool use

OpenAI shipped o3 and o4-mini in ChatGPT and the API, with o3 setting new SOTA records on Codeforces, SWE-bench, MMMU and more. For the first time the models can use tools (web search, Python, image generation) during the reasoning process, and they can think visually by cropping, zooming and rotating images. o3 scored $65k on the Freelancer eval versus o1's $28k, and o4-mini hits 99.5% on AIME with a Python interpreter.

$65 o3 score on the Freelancer eval ($65k vs o1's $28k)99.5% o4-mini on AIME with Python interpreter200 context window (200k tokens)

Blog ↗Watch Party ↗

🎙️ Hear our coverage →

#reasoning #agents #multimodal

Prime Intellect Apr 17, 2025

New ModelsOpen weights

INTELLECT-2

Prime Intellect launches INTELLECT-2, a 32B globally-distributed RL run

Prime Intellect released INTELLECT-2, a 32B reasoning model trained with globally decentralized reinforcement learning, a follow-up to the INTELLECT-1 decentralized pretraining run covered on the show in December. The release includes open weights on Hugging Face, a tech report, and the PRIME-RL training code.

Blog ↗X ↗Blog ↗Tech report ↗

🎙️ Hear our coverage (+1 follow-up) →

#open-source #training #reasoning

Zhipu AI (Z.ai) Apr 17, 2025

New ModelsOpen weights

GLM-4-0414

Z.ai (formerly chatGLM) releases the GLM-4-0414 open-source family

Z.ai, the rebranded Zhipu AI / chatGLM team, released the GLM-4-0414 family of open-source models. The drop includes base, reasoning and rumination variants published on Hugging Face and GitHub.

X ↗HF Collection ↗GitHub ↗

🎙️ Hear our coverage →

#open-source #reasoning

Amazon Apr 10, 2025

New Models

Nova Sonic

Amazon unveils Nova Sonic, a speech-to-speech foundation model

Amazon announced Nova Sonic, a foundational speech-to-speech model that unifies speech understanding and generation for real-time, natural-sounding voice conversations. It is available through Amazon Bedrock as part of the Nova family.

Amazon blog: Nova Sonic ↗

🎙️ Hear our coverage →

D Deep Cogito Apr 10, 2025

New ModelsOpen weights

Cogito v1 Preview (3B-70B)

Deep Cogito debuts Cogito v1 Preview models from 3B to 70B, beating DeepSeek 70B

New lab Deep Cogito released the Cogito v1 Preview family of open models ranging from 3B to 70B parameters, claiming SOTA results at each size and beating DeepSeek's 70B distill. The models are available on Hugging Face, giving local AI enthusiasts the small-to-mid sizes Llama 4 skipped.

3B-70B Model size range

Deep Cogito research blog: Cogito v1 Preview ↗Hugging Face: cogito-v1-preview-llama-70B ↗

🎙️ Hear our coverage →

#open-source #reasoning

HiDream AI Apr 10, 2025

New ModelsOpen weights

HiDream-I1-Dev

HiDream-I1-Dev: 17B MIT-licensed image model surpasses Flux 1.1 [pro]

HiDream released HiDream-I1-Dev, a 17B parameter open-weights image generation model under an MIT license. It became the new leading open-weights image generator, surpassing Flux 1.1 [pro] on quality benchmarks.

17B Parameters, MIT license

Hugging Face collection: HiDream-I1 ↗

🎙️ Hear our coverage →

#image-gen #open-source

Jina AI Apr 10, 2025

New ModelsOpen weights

Jina Reranker M0

Jina Reranker M0: SOTA multilingual, multimodal document reranker

Jina AI released Jina Reranker M0, a state-of-the-art multimodal and multilingual document reranker model. It reranks documents that include both text and images, targeting retrieval and RAG pipelines, with weights available on Hugging Face.

Jina blog: Reranker M0 ↗Hugging Face: jina-reranker-m0 ↗

🎙️ Hear our coverage →

#search #open-source #multimodal

Meta AI Apr 10, 2025

New ModelsOpen weights

Llama 4 (Scout & Maverick)

Meta drops Llama 4 Scout (109B) and Maverick (400B) open-weights MoE models

Meta released the long-awaited Llama 4 family in a chaotic Saturday drop: Scout (17B active / ~109B total, 16 experts) and Maverick (17B active / ~400B total, 128 experts), with a 2T-parameter Behemoth still in training. The models are multimodal, multilingual MoE architectures trained on ~30T tokens with FP8 and interleaved attention (iRoPE), claiming 10M context for Scout and 1M for Maverick. The release was marred by drama: the LMArena version differed from the released model, and the community criticized the lack of small local-friendly sizes.

10M Stated context window for Llama 4 Scout288B Active parameters of unreleased Behemoth (2T total)17B Active parameters for both Scout and Maverick

Meta blog: Llama 4 multimodal intelligence ↗Hugging Face: meta-llama ↗Try it at meta.ai ↗

🎙️ Hear our coverage →

#open-source #architecture #multimodal

Moonshot AI (Kimi) Apr 10, 2025

New ModelsOpen weights

Kimi-VL & Kimi-VL-Thinking

Moonshot drops Kimi-VL and Kimi-VL-Thinking, tiny A3B open vision models

Moonshot AI released Kimi-VL and Kimi-VL-Thinking, compact vision-language models with only ~3B active parameters (A3B MoE). The thinking variant adds reasoning to a tiny VLM, and both are available openly on Hugging Face.

A3B ~3B active parameters (MoE)

Hugging Face collection: Kimi-VL-A3B ↗

🎙️ Hear our coverage →

#open-source #vision #reasoning

NVIDIA Apr 10, 2025

New ModelsOpen weights

Llama-3.1-Nemotron-Ultra-253B

NVIDIA ships Nemotron Ultra, a 253B pruned and distilled Llama 3.1-405B

NVIDIA released Nemotron Ultra, a pruned and distilled finetune of Llama 3.1-405B at roughly half the parameters (253B). Its benchmarks even included Llama 4 comparisons, showing the older finetuned Llama beating the new models on AIME, GPQA and more. It supports 128K context and fits on a single 8xH100 node for inference.

253B Parameters (pruned from Llama 3.1-405B)128K Context window

Hugging Face: Llama-3_1-Nemotron-Ultra-253B-v1 ↗Announcement on X ↗

🎙️ Hear our coverage →

#open-source #training #reasoning

Together AI & Agentica (UC Berkeley) Apr 10, 2025

New ModelsOpen weights

DeepCoder-14B-Preview

DeepCoder-14B: open RL-finetuned coder beats DeepSeek R1 and o3-mini on coding

Together AI and Agentica (UC Berkeley Sky Computing Lab) released DeepCoder-14B-Preview, a reasoning model finetuned with RL that beats DeepSeek R1 and even o3-mini on several coding benchmarks. The project aims to democratize RL: the team open-sourced the model, the training dataset, the Weights & Biases logs, and the eval logs. Guest Michael Luo from Agentica joined the show to discuss the release.

14B Model parameters

Together AI blog: DeepCoder ↗Announcement on X ↗Hugging Face: DeepCoder-14B-Preview ↗Hugging Face dataset: DeepCoder-Preview-Dataset ↗

🎙️ Hear our coverage →

#open-source #coding #reasoning

All Hands AI Apr 3, 2025

New ModelsOpen weights

OpenHands LM 32B

OpenHands LM 32B: MIT-licensed coding agent model hits 37.2% SWE-Bench

All Hands AI (formerly OpenDevin) released OpenHands LM 32B, an MIT-licensed Qwen finetune that scores 37.2% on SWE-Bench Verified, competing with much larger models on real-world repo tasks. The OpenHands agent also took the #2 spot on the new Live SWE-Bench leaderboard, and the 32B model runs locally on a single RTX 3090. A hosted OpenHands Cloud version is also available; guest Xingyao Wang joined the show to discuss it.

37.2% SWE-Bench Verified score#2 Live SWE-Bench leaderboard (OpenHands agent)

Introducing OpenHands LM 32B (blog) ↗Model on Hugging Face (MIT license) ↗OpenHands Cloud ↗

🎙️ Hear our coverage →

#open-source #coding #agents

Gladia Apr 3, 2025

New Models

Solaria STT

Gladia launches Solaria speech-to-text model

Gladia launched Solaria, a new speech-to-text model offered through its transcription platform. It arrived in a busy week for voice AI alongside Hailuo's Speech-02 TTS.

Gladia Solaria ↗

🎙️ Hear our coverage →

H HKU NLP (University of Hong Kong) Apr 3, 2025

New Models

Dream 7B

Dream 7B: a diffusion language model challenger unveiled

Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.

Dream 7B blog post ↗Benchmark results thread (Sudoku) ↗

🎙️ Hear our coverage →

#architecture #research #reasoning

Nomic AI Apr 3, 2025

New ModelsOpen weights

Nomic Embed Multimodal

Nomic Embed Multimodal: SOTA embeddings for visual documents

Nomic AI released Nomic Embed Multimodal, new 3B and 7B parameter embedding models built on Alibaba's Qwen2.5-VL. They achieve SOTA on visual document retrieval by embedding interleaved text-image sequences, ideal for PDFs and complex webpages. The 7B model ships under Apache 2.0 with open weights, code, and data; guest Zach Nussbaum discussed the release on the show.

3B parameters (smaller model)7B parameters (Apache 2.0 model)

Nomic Embed Multimodal blog post ↗Models on Hugging Face ↗

🎙️ Hear our coverage →

#search #multimodal #open-source

Runway Apr 3, 2025

New Models

Runway Gen-4

Runway Gen-4 announced with major gains in video consistency

Runway announced Gen-4, its next-generation video model focused on character and world consistency across shots. Example videos showed notably coherent characters and scenes, pushing AI video further toward usable filmmaking.

Introducing Runway Gen-4 ↗

🎙️ Hear our coverage →

#video-gen #image-gen

🚀 Products & Apps 7

Character.AI Apr 24, 2025

Products & Apps

AvatarFX

Character.AI opens early access to AvatarFX talking avatars

Character.AI announced AvatarFX, now in early access, which turns static images into speaking, emoting video avatars. It targets bringing characters to life for conversational and creative use cases.

🎙️ Hear our coverage →

Mistral AI Apr 17, 2025

Products & Apps

Classifiers Factory

Mistral releases Classifiers Factory

Mistral announced Classifiers Factory, a service for building and training custom text classifiers on its platform. Covered as a quick item in the Big CO LLMs + APIs section of the show.

🎙️ Hear our coverage →

#on-device #api

Anthropic Apr 10, 2025

Products & Apps

Claude Max plan

Anthropic launches Max plan at $200/mo with higher usage quotas

Anthropic introduced a new Max subscription tier priced at $200 per month, offering significantly more usage quota than the standard Pro plan. It mirrors OpenAI's Pro-tier pricing strategy for power users.

$200/mo Max plan price

🎙️ Hear our coverage →

#api #consumer-ai

Google Apr 10, 2025

Products & Apps

Firebase Studio

Google launches Firebase Studio AI app-building environment at Cloud Next

As part of a flood of announcements at Google Cloud Next 2025, Google launched Firebase Studio, a browser-based AI-powered environment for building and shipping full-stack apps. It was one of the headline developer-facing launches from the event.

Firebase Studio ↗Google Cloud Next 2025 announcements ↗

🎙️ Hear our coverage →

#coding #agents

Amazon Apr 3, 2025

Products & Apps

Nova Act

Amazon announces Nova Act browser agent SDK

Amazon entered the agent race with Nova Act, an agent designed to take actions in web browsers, possibly built with talent from the Adept acquisition. Amazon claims it beats Claude 3.5 and OpenAI's computer-use model on some benchmarks, but it is only available via an SDK behind a request form, so claims could not be verified hands-on.

Nova Act announcement (Amazon Science) ↗Access request form ↗

🎙️ Hear our coverage →

ByteDance Apr 3, 2025

Products & Apps

OmniHuman (via Dreamina)

ByteDance's OmniHuman image-to-avatar model goes public via Dreamina

ByteDance's impressive OmniHuman model, which turns a single image plus audio into a realistic talking avatar video, became publicly usable through the Dreamina (CapCut) website. The results land squarely in uncanny-valley territory, as Alex demonstrated with his own avatar thread.

OmniHuman on Dreamina ↗Example thread by Alex ↗

🎙️ Hear our coverage →

Cognition Labs Apr 3, 2025

Products & Apps

Devin 2.0

Devin 2.0 launches with new IDE experience and $20/month entry price

Breaking during the show: Cognition Labs launched Devin 2.0, the second version of its AI software engineer, with a new IDE experience. Crucially, pricing now starts at $20/month, down from the original $500/month tier, making the agent far more accessible.

$20/mo new starting price

🎙️ Hear our coverage →

#agents #coding

✨ Major Features & Updates 8

Anthropic Apr 17, 2025

Major Features & Updates

Claude Research

Claude gains Research mode and Google Workspace integration

Anthropic shipped a Research capability for Claude, letting it conduct multi-step research across the web, alongside a Google Workspace integration that connects Claude to email, calendar and docs context.

🎙️ Hear our coverage →

#agents #research #consumer-ai

Google DeepMind Apr 17, 2025

Major Features & Updates

Veo 2

Veo 2 video generation hits GA in the API and Gemini App

Google made Veo 2 video generation generally available for developers and rolled it out in the Gemini App. The GA release brings Google's flagship text-to-video model out of preview and into production use.

Dev Blog ↗Try It ↗

🎙️ Hear our coverage →

Weights & Biases Apr 17, 2025

Major Features & Updates

W&B Weave Playground

W&B Weave Playground adds GPT-4.1 family and o3/o4-mini support

The Weights & Biases Weave Playground shipped full support for the new GPT-4.1 family and the o3/o4-mini models, letting developers evaluate and compare the week's new models for their own applications.

X ↗W&B Weave ↗

🎙️ Hear our coverage →

#benchmarks #coding

Google DeepMind Apr 10, 2025

Major Features & Updates

Official MCP support

Google announces official support for the Model Context Protocol (MCP)

Demis Hassabis announced that Google will officially support Anthropic's Model Context Protocol (MCP) in its models and SDKs. This was a major signal of MCP becoming the industry standard for connecting AI models to tools and data.

Demis Hassabis announcement on X ↗

🎙️ Hear our coverage →

OpenAI Apr 10, 2025

Major Features & Updates

ChatGPT enhanced memory

OpenAI gives ChatGPT enhanced memory that can recall all your past chats

OpenAI rolled out enhanced memory for ChatGPT, allowing it to reference and recall all of a user's previous conversations rather than just saved memories. This makes ChatGPT significantly more personalized across sessions.

OpenAI announcement on X ↗

🎙️ Hear our coverage →

#consumer-ai #agents

Google Apr 3, 2025

Major Features & Updates

NotebookLM source discovery

Google NotebookLM can now discover related sources for you

Google's NotebookLM added a source discovery feature that finds and suggests related sources for a notebook, instead of relying solely on user-uploaded documents. It extends NotebookLM further into research-assistant territory.

Google blog: NotebookLM discover sources ↗

🎙️ Hear our coverage →

#research #consumer-ai

OpenAI Apr 3, 2025

Major Features & Updates

ChatGPT "Monday" voice

OpenAI ships new EMO "Monday" voice in ChatGPT

OpenAI added a new "Monday" voice to ChatGPT's voice mode, an EMO-flavored persona released around April 1st. It rounds out a week of OpenAI shipping across models, evals, and product.

OpenAI announcement on X ↗

🎙️ Hear our coverage →

#voice-ai #consumer-ai

Windsurf Apr 3, 2025

Major Features & Updates

Windsurf Netlify deployments

Windsurf adds one-click deployments to Netlify

Windsurf shipped a deployments feature that lets users push apps straight to Netlify from the editor. A small but practical step toward end-to-end app building inside AI coding tools.

Windsurf announcement on X ↗

🎙️ Hear our coverage →

🔌 APIs & Platforms 3

OpenAI Apr 24, 2025

APIs & Platforms

gpt-image-1

OpenAI's GPT Image generation lands in the API as gpt-image-1

OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available to developers via API under the official name gpt-image-1. This was the big one many developers were waiting for, opening up the viral image generation and editing capabilities for building AI art and image editing applications.

X Post ↗Docs ↗API Reference ↗

🎙️ Hear our coverage →

#image-gen #api

xAI Apr 10, 2025

APIs & Platforms

Grok 3 API

xAI finally launches the Grok 3 API tier

xAI made Grok 3 and Grok 3 Mini available via API, giving developers programmatic access to its frontier models for the first time. The Grok app also received updates the same week.

xAI API models and pricing ↗API Docs ↗App Update X Post ↗

🎙️ Hear our coverage (+1 follow-up) →

#api #frontier-models

Hailuo AI (MiniMax) Apr 3, 2025

APIs & Platforms

Speech-02

Hailuo Speech-02 TTS API: potentially SOTA emotional voice cloning

Hailuo (MiniMax) released the Speech-02 TTS API, which Alex called potentially state of the art for emotional control and voice cloning quality. It produces nuanced, realistic synthetic voices and was the standout voice release of the week.

Hailuo Speech-02 announcement on X ↗

🎙️ Hear our coverage →

🛠️ Dev Tools 4

HumanLayer Apr 24, 2025

Dev ToolsOpen weights

12-Factor Agents

Dex Horthy publishes 12-Factor Agents, a guide to production-ready agents

HumanLayer founder Dex Horthy published 12-Factor Agents, an open GitHub repo and essay distilling common patterns and pitfalls for building reliable, production-ready AI agents. Drawing on his experience building agent SDKs, it argues that serious teams end up writing large parts from scratch and lays out principles for robust agent design, discussed in depth on the show.

GitHub Repo ↗Webinar Recording ↗

🎙️ Hear our coverage →

#agents #coding #open-source

OpenAI Apr 17, 2025

Dev ToolsOpen weights

Codex CLI

OpenAI debuts Codex CLI, an open source terminal coding agent

OpenAI released Codex CLI, an open source coding tool for the terminal. It ships with hardened security, using Apple Seatbelt on macOS to limit execution to the current directory plus temp files.

🎙️ Hear our coverage →

#coding #agents #open-source

Cloudflare Apr 10, 2025

Dev ToolsOpen weights

Agents SDK

Cloudflare releases a new Agents SDK for building stateful AI agents

Cloudflare shipped a new Agents SDK for building and deploying AI agents on its edge platform. It joins the week's wave of agent infrastructure announcements alongside Google's A2A and broad MCP adoption.

agents.cloudflare.com ↗

🎙️ Hear our coverage →

#agents #coding

G GitMCP (Liad Yosef & Ido Salomon) Apr 10, 2025

Dev ToolsOpen weights

GitMCP

GitMCP turns any GitHub repo into an MCP server instantly

Creators Liad Yosef and Ido Salomon launched GitMCP, a free tool that turns any GitHub repository into an MCP server by simply swapping the domain (gitmcp.io/user/repo). It lets AI assistants ground themselves in a repo's docs and code, and the creators joined the show to demo it.

🎙️ Hear our coverage →

#agents #coding #open-source

📄 Papers & Research 3

ByteDance Apr 10, 2025

Papers & Research

Seed-Thinking-v1.5

ByteDance publishes Seed-Thinking-v1.5 reasoning model tech report

ByteDance's Seed team published Seed-Thinking-v1.5, a new reasoning model announced via a technical report on GitHub. It was mentioned among the week's open-source LLM news, though weights were not released at the time.

GitHub: Seed-Thinking-v1.5 ↗

🎙️ Hear our coverage →

#reasoning #research

Stanford / NVIDIA / UCSD / UC Berkeley Apr 10, 2025

Papers & ResearchOpen weights

One-Minute Video Generation with Test-Time Training

Test-Time Training paper one-shots minute-long videos with consistent characters

Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.

1 min Single-shot generated video length

Project blog ↗Paper ↗

🎙️ Hear our coverage →

#video-gen #research #training

Meta AI Apr 3, 2025

Papers & Research

MoCha

Meta's MoCha generates movie-grade talking AI characters from speech and text

Meta GenAI researchers published MoCha, a model that generates stunningly realistic, movie-grade talking characters directly from speech plus text. Co-author Cong Wei joined the show to discuss the work, which points at AI actors entering Hollywood-quality territory.

MoCha project page ↗

🎙️ Hear our coverage →

#video-gen #research

📊 Benchmarks & Evals 4

OpenAI Apr 17, 2025

Benchmarks & EvalsOpen weights

MRCR

OpenAI open sources the MRCR long-context benchmark dataset

OpenAI open sourced MRCR, a benchmark dataset for evaluating long-context, complex retrieval tasks, building on Gemini research from Google and publishing the dataset on Hugging Face.

Hugging Face ↗

🎙️ Hear our coverage →

#benchmarks #architecture

CoreWeave Apr 3, 2025

Benchmarks & Evals

CoreWeave GB200 inference benchmark

CoreWeave hits 800 tok/s on Llama 405B with NVIDIA GB200 Blackwell

CoreWeave announced record-breaking AI inference benchmarks using NVIDIA's new GB200 Grace Blackwell superchips: 800 tokens/sec on Llama 3.1 405B, plus 33,000 tokens/sec on Llama 2 70B with H200s. It is a marker of how fast inference hardware is accelerating.

800 tok/s Llama 3.1 405B on GB20033,000 tok/s Llama 2 70B on H200

CoreWeave press release ↗

🎙️ Hear our coverage →

#infrastructure #benchmarks

Google DeepMind Apr 3, 2025

Benchmarks & Evals

Gemini 2.5 Pro USAMO results

Gemini 2.5 Pro scores 24.4% on USAMO olympiad math, crushing the field

New evaluation results published this week showed Gemini 2.5 Pro scoring 24.4% on the USA Math Olympiad (USAMO), problems so hard that most top models score under 5%. The result showcases a step change in frontier reasoning ability on competition mathematics.

24.4% Gemini 2.5 Pro USAMO score<5% typical score for other top models

🎙️ Hear our coverage →

#reasoning #benchmarks

OpenAI Apr 3, 2025

Benchmarks & EvalsOpen weights

PaperBench

OpenAI releases PaperBench eval and open-sources Nano-Eval framework

OpenAI published PaperBench, a tough new evaluation that tests whether AI agents can replicate cutting-edge AI research papers, with more than 8,300 graded tasks and meta-evaluation of the LLM judge. The best model managed only a 21.0% replication score versus 41.4% for human PhDs. The code and the Nano-Eval framework were open sourced on GitHub alongside the paper.

8,300+ graded tasks in the benchmark21.0% best model replication score41.4% human PhD baseline score

PaperBench announcement ↗PaperBench code on GitHub ↗PaperBench paper (PDF) ↗Nano-Eval framework (openai/preparedness) ↗

🎙️ Hear our coverage →

#benchmarks #research #agents

💰 Funding 1

OpenAI Apr 3, 2025

Funding

OpenAI $40B funding round

OpenAI raises $40B at a $300B valuation

OpenAI closed a $40 billion funding round at a $300 billion valuation, one of the largest private raises ever. The show noted the raise rode the wave of native image generation in ChatGPT, with especially strong growth in India.

$40B capital raised$300B post-money valuation

OpenAI: Investing in our mission ↗

🎙️ Hear our coverage →

🌀 Also Released 3

Google Apr 10, 2025

Also ReleasedOpen weights

Agent2Agent (A2A) protocol

Google announces A2A, an open agent-to-agent communication protocol

Google announced the Agent2Agent (A2A) protocol at Cloud Next, an open spec for agents from different vendors to discover and communicate with each other. The spec was published on GitHub with a long list of launch partners, including Weights & Biases.

Google Developers blog: A2A ↗A2A spec on GitHub ↗W&B partnership blog ↗

🎙️ Hear our coverage →

#agents #open-source

Weights & Biases Apr 10, 2025

Also Released

observable.tools & MCP RFC-269

W&B launches observable.tools initiative and MCP observability RFC

Weights & Biases launched the observable.tools initiative and published an RFC (RFC-269) proposing observability standards for the Model Context Protocol, inviting community comment. W&B also announced it is a launch partner for Google's A2A protocol.

observable.tools ↗MCP RFC ↗W&B + Google A2A partnership blog ↗

🎙️ Hear our coverage →

#agents #coding

Weights & Biases Apr 3, 2025

Also ReleasedOpen weights

Observable Tools

W&B launches Observable.tools initiative to add observability to MCP

Alex and Weights & Biases launched the Observable Tools initiative to bring observability to the Model Context Protocol (MCP) ecosystem, since external tool calls currently lose visibility for debugging and security. A concrete proposal using OpenTelemetry was posted to the MCP specification GitHub discussions for community feedback.

Observable.tools ↗OpenTelemetry proposal on MCP spec GitHub ↗Viral MCP clients tweet ↗

🎙️ Hear our coverage →

#agents #coding

← March 2025 All months May 2025 →