Everything AI Released in May 2025

43 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

← April 2025 All months July 2025 →

🧠 New Models 22

Black Forest Labs May 29, 2025

New Models

FLUX.1 Kontext

Black Forest Labs drops FLUX.1 Kontext, SOTA image editing

Black Forest Labs, creators of Flux, released Kontext: three models (Pro, Max, and a 12B open-weights Dev in private preview) for consistent, context-aware text and image editing. Unlike GPT-image or VEO-style regeneration, Kontext keeps identity consistent across edits, adding what you ask for without changing your face every generation. Broke as news during the show.

Tweet ↗Announcement ↗Flux Playground ↗

🎙️ Hear our coverage →

DeepSeek May 29, 2025

New ModelsOpen weights

DeepSeek-R1-0528

DeepSeek drops R1-0528, an updated open reasoning model with big gains

DeepSeek released R1-0528 out of nowhere, an update to their open-weights reasoning model with serious performance jumps: AIME 91, LiveCodeBench 73, and SWE-bench Verified 57.6. They also shipped an 8B distilled version based on Qwen3 that can run on a laptop, keeping it among the best open-weight models available.

91 AIME score, beating previous R1 by a mile8B Distilled Qwen3-based version runnable on a laptop

🎙️ Hear our coverage →

#open-source #reasoning

Haize Labs May 29, 2025

New ModelsOpen weights

j1-nano & j1-micro

Haize Labs releases j1-nano and j1-micro tiny reward models

Haize Labs shipped j1-nano (600M params) and j1-micro (1.7B params), tiny open reward models for judging LLM outputs. Despite their small size, j1-micro scores 80.7% on RewardBench, making capable reward modeling accessible on modest hardware.

Tweet ↗GitHub ↗HF j1-micro ↗HF j1-nano ↗

🎙️ Hear our coverage →

#open-source #training #benchmarks

Resemble AI May 29, 2025

New ModelsOpen weights

Chatterbox

Resemble AI open-sources Chatterbox voice cloning with emotion control

Resemble AI released Chatterbox, an open-source voice cloning model with emotion control. Weights and code are public on GitHub and Hugging Face, bringing controllable, expressive voice cloning to the open ecosystem.

GitHub ↗Hugging Face ↗

🎙️ Hear our coverage →

#voice-ai #open-source

Tencent (Hunyuan) May 29, 2025

New Models

HunyuanPortrait

Tencent's HunyuanPortrait animates portraits from a single photo

Tencent's Hunyuan team published HunyuanPortrait, a model for high-fidelity portrait video generation from a single photo. It animates a still portrait into realistic talking-head video, with an accompanying paper.

Site ↗Paper ↗

🎙️ Hear our coverage →

Tencent (Hunyuan) May 29, 2025

New ModelsOpen weights

HunyuanVideo-Avatar

Tencent releases HunyuanVideo-Avatar for audio-driven avatars

Tencent Hunyuan released HunyuanVideo-Avatar, an audio-driven full-body avatar animation model. Feed it audio and a reference image and it animates a full-body avatar in sync, pushing AI-generated humans further toward indistinguishable.

Site ↗Tweet ↗

🎙️ Hear our coverage →

A A-M Team May 15, 2025

New ModelsOpen weights

AM-Thinking v1

AM-Thinking v1: 32B dense reasoning model beats bigger MoEs at math and code

A 32B dense open-weights reasoning LLM from a new Chinese team that takes on much larger mixture-of-experts models and comes out on top for math and code, hitting 85.3% on AIME 2024, 70.3% on LiveCodeBench v5, and 92.5% on Arena-Hard. It supports a /think reasoning toggle, ships with a permissive license, is tooled for vLLM, LM Studio, and Ollama, and runs at 25 tokens/sec on a single 80GB GPU with INT4 quantization. A multilingual RLHF pass and 128k context window are in the works.

32B dense parameters85.3% AIME 202425 tokens/sec on a single 80GB GPU with INT4

Hugging Face ↗Paper ↗Project page ↗

🎙️ Hear our coverage →

#open-source #reasoning

Alibaba May 15, 2025

New ModelsOpen weights

Wan 2.1

Alibaba's Wan 2.1: open-source diffusion-transformer text-to-video suite

Alibaba, the team behind the Qwen LLMs, released Wan 2.1, a full stack of open-source diffusion-transformer text-to-video foundation models. Amid the show's discussion of video-model fatigue, this was called out as a release that cuts through the noise, with weights on Hugging Face and code on GitHub.

Hugging Face ↗GitHub ↗Announcement tweet ↗Try it ↗

🎙️ Hear our coverage →

#video-gen #open-source #architecture

ByteDance May 15, 2025

New Models

Seed1.5-VL

ByteDance publishes Seed1.5-VL, a 20B vision-language thinking model

ByteDance's Seed team published the technical report for Seed1.5-VL, a 20B-parameter vision-language model with thinking capabilities. It was covered among the big-company releases of the week, with the tech report shared on GitHub.

Technical report ↗

🎙️ Hear our coverage →

#vision #multimodal #reasoning

Lightricks May 15, 2025

New Models

LTX Video (distilled)

LTX distilled model enables near real-time video generation

Lightricks shared a distilled version of its LTX video model that generates video at near real-time speeds. It was highlighted in the vision and video segment as a notable speed milestone for video generation.

Announcement on X ↗

🎙️ Hear our coverage →

#video-gen #voice-ai

Stability AI May 15, 2025

New ModelsOpen weights

Stable Audio Open Small

Stability AI and Arm release Stable Audio Open Small for on-device audio

Stability AI, together with Arm, released Stable Audio Open Small, a 341M-parameter open text-to-audio model built for real-world on-device deployment. The show framed it as part of a small comeback for Stability, with weights on Hugging Face and an accompanying paper.

Blog ↗Paper ↗Hugging Face ↗Announcement on X ↗

🎙️ Hear our coverage →

#audio #on-device #open-source

StepFun May 15, 2025

New ModelsOpen weights

Step1X-3D

StepFun's Step1X-3D: open two-stage framework for textured 3D assets

StepFun released Step1X-3D, an open two-stage framework for high-fidelity, controllable generation of textured 3D assets: it first synthesizes watertight geometry, then generates view-consistent textures. Trained on 2M curated meshes, the release also includes a curated dataset of 800K assets and a Hugging Face demo.

Hugging Face ↗Demo ↗Dataset ↗

🎙️ Hear our coverage →

#world-models #open-source #training

Technology Innovation Institute (TII) May 15, 2025

New ModelsOpen weights

Falcon-Edge

Falcon-Edge: ternary BitNet LLMs for edge deployment under 1GB VRAM

TII's Falcon-Edge project releases ternary BitNet LLMs (1B and 3B base models) that slash memory and compute requirements, enabling inference on less than 1GB of VRAM. Fine-tuners get pre-quantized checkpoints and a clear path to 1-bit LLMs.

Blog ↗Falcon-E-1B on Hugging Face ↗Falcon-E-3B on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #on-device #infrastructure

Alibaba (Qwen) May 1, 2025

New ModelsOpen weights

Qwen 2.5 Omni

Qwen 2.5 Omni gets an update

Alongside the Qwen 3 launch, Alibaba updated its Qwen 2.5 Omni multimodal model line. Mentioned briefly in the open-source roundup as part of the week's Qwen ecosystem push.

Alibaba Qwen announcement (X) ↗

🎙️ Hear our coverage →

#open-source #multimodal

Alibaba (Qwen) May 1, 2025

New ModelsOpen weights

Qwen 3

Alibaba open-weights the full Qwen 3 family under Apache 2.0

Alibaba released the entire Qwen 3 stack: two MoE models (235B total/22B active and 30B/3B active) plus six dense siblings from 32B down to 0.6B, all Apache 2.0 with day-one support in LM Studio, Ollama, vLLM, MLX and llama.cpp. The headline feature is a runtime hybrid 'thinking' toggle (/think and /no_think) that trades latency for reasoning depth. Trained on ~36T tokens with 128K context and 119-language coverage, the 235B MoE rivals DeepSeek-R1, o1, o3-mini and Gemini 2.5 Pro on coding and math.

235 B Flagship MoE total parameters (22B active)30 B Qwen3-30B-A3B hit 57 tok/s on a Mac with speculative decoding36 Trillions of pre-training tokens (2x Qwen 2.5)

Qwen 3 blog post ↗GitHub ↗Hugging Face collection ↗HF demo ↗

🎙️ Hear our coverage →

#open-source #reasoning #architecture

HiDream May 1, 2025

New ModelsOpen weights

HiDream E1

HiDream E1: open-weights image model with standout Ghibli style

HiDream released E1, an open-weights image editing/generation model (Apache 2.0-style licensing) noted for beautiful Ghibli-style outputs. It ranks #4 on the Artificial Analysis image arena leaderboard, sitting among top contenders like Google Imagen and ReCraft.

Hugging Face: HiDream-E1-Full ↗

🎙️ Hear our coverage →

#image-gen #open-source

JetBrains May 1, 2025

New ModelsOpen weights

Mellum-4b-base

JetBrains open-sources Mellum-4b, its code completion focal model

JetBrains published Mellum-4b-base on Hugging Face, a 4B-parameter model specialized for code completion that powers its IDE AI features. Listed in the episode's open-source links roundup.

Hugging Face: Mellum-4b-base ↗

🎙️ Hear our coverage →

#open-source #coding

Kyutai May 1, 2025

New ModelsOpen weights

Helium-1

Kyutai releases Helium-1, a 2B European-language model plus dactory pipeline

Kyutai released Helium-1, a 2B-parameter model distilled from Gemma-2-9B and purpose-built for Europe's 24 official languages, under CC-BY 4.0. It sets a new state of the art for its size class on MMLU-EU, ARC-EU and FLORES translation while fitting in under 2GB VRAM for edge and phone deployment. They also open-sourced 'dactory' (MIT), their full Common Crawl data-processing pipeline that scores, dedups and tags webpages.

Blog post ↗Hugging Face: helium-1-2b ↗Dactory pipeline (GitHub) ↗

🎙️ Hear our coverage →

#open-source #multilingual #on-device

Meta AI May 1, 2025

New ModelsOpen weights

Llama Guard 4

Meta ships Llama protection suite: Llama Guard 4, Firewall, Prompt Guard 2

Meta's LlamaCon security drop included Llama Guard 4 (text + image protection), Llama Firewall (stops prompt hacks and risky code), Prompt Guard 2 (faster jailbreak defense), CyberSecEval 4, and a new Defender Program for security researchers.

AI at Meta LlamaCon announcements (X) ↗

🎙️ Hear our coverage →

#safety #open-source

Microsoft May 1, 2025

New ModelsOpen weights

Phi-4-reasoning

Microsoft ships Phi-4-reasoning and Phi-4-reasoning-plus (14B, MIT)

Microsoft fine-tuned the 14B Phi-4 on 1.4M curated chain-of-thought traces (SFT) and added a small RL stage (Plus variant) to create two MIT-licensed reasoning models. They punch far above their weight: Phi-4-reasoning-plus outperforms DeepSeek-R1-Distill-70B on AIME 25 (78% vs 51%) and sits within a few points of the full 671B DeepSeek-R1, while running on a single GPU with explicit <think> scaffolding.

ArXiv paper ↗Tech report ↗Hugging Face: Phi-4-reasoning ↗Suriya's thread ↗

🎙️ Hear our coverage →

#open-source #reasoning #on-device

OpenPipe May 1, 2025

New ModelsOpen weights

ART·E

OpenPipe's ART·E: RL-trained open email agent that beats o3

OpenPipe released ART·E, an Apache 2.0 email research agent built on a 14B Qwen 2.5 backbone, trained on 500K Enron emails plus synthetic Q&A and refined with reinforcement learning. It tops o3 on accuracy (96% vs 90%) while running 5x faster (1.1s median) and 64x cheaper ($0.85 per 1,000 queries), using a simple three-tool loop.

Launch thread (X) ↗Blog post ↗GitHub: OpenPipe/ART ↗

🎙️ Hear our coverage →

#agents #training #open-source

Xiaomi May 1, 2025

New ModelsOpen weights

MiMo-7B

Xiaomi enters open weights with MiMo-7B, MIT-licensed reasoning family

Xiaomi's first open-weights release is a 7B dense family (Base, SFT, RL, RL-Zero) trained from scratch on 25T tokens with a multi-token-prediction objective and rule-verifiable reinforcement learning. The RL variant matches OpenAI o1-mini on benchmark suites despite being far smaller, scoring 55.4% on AIME 2025 and 49.3% on LiveCodeBench v6, all under an MIT license with vLLM-ready weights.

Hugging Face model hub ↗

🎙️ Hear our coverage →

#open-source #reasoning #training

🚀 Products & Apps 5

Kyutai May 29, 2025

Products & Apps

Unmute.sh

Kyutai launches Unmute.sh, a low-latency voice wrapper for any LLM

Kyutai (the lab behind Moshi) launched Unmute.sh, a modular wrapper that adds voice to any text LLM with under 300ms latency and semantic VAD that knows a thinking pause from a breath. It preserves the underlying text model's capabilities while adding natural voice interaction, and is slated to be open-sourced.

Try It ↗X announcement ↗

🎙️ Hear our coverage →

Odyssey May 29, 2025

Products & Apps

Odyssey Interactive Video

Odyssey debuts real-time interactive AI video at 30 FPS

Odyssey launched interactive video: real-time AI world exploration rendered at 30 FPS, letting you walk through generated worlds as they are created. A glimpse at world-model-driven media where the video responds to you instead of just playing back.

Blog ↗Try It ↗

🎙️ Hear our coverage →

#video-gen #world-models

Opera May 29, 2025

Products & Apps

Opera Neon

Opera unveils Neon, an agent-centric AI browser

Opera announced Neon, an agent-centric AI browser built for autonomous web tasks. Instead of just assisting with browsing, it is designed to act on the web for you, joining the emerging category of agentic browsers.

Site ↗Tweet ↗

🎙️ Hear our coverage →

#agents #consumer-ai

Google DeepMind May 15, 2025

Products & Apps

AlphaEvolve

AlphaEvolve: Gemini-powered coding agent for discovering new algorithms

Google DeepMind announced AlphaEvolve, a Gemini-powered coding agent that designs and evolves advanced algorithms, credited on the show as one of the week's mind-bending algorithmic-discovery stories. DeepMind opened an interest form for early access rather than shipping it broadly.

🎙️ Hear our coverage →

#agents #coding #research

Nous Research May 15, 2025

Products & AppsOpen weights

Psyche

Nous Research launches Psyche, a decentralized cooperative-training network

Psyche is Nous Research's decentralized cooperative-training network that lets distributed participants jointly train large models over the internet. The launch includes open code on GitHub and a live dashboard tracking the first run, a 40B model called Consilience. COO Dillon Rolnick joined the show to explain the decentralized training push.

Website ↗GitHub ↗Announcement tweet ↗Consilience 40B dashboard ↗

🎙️ Hear our coverage →

#training #open-source #infrastructure

✨ Major Features & Updates 7

Anthropic May 29, 2025

Major Features & Updates

Claude Voice Mode

Anthropic releases voice mode on Claude mobile apps

Anthropic shipped a voice mode on mobile, bringing conversational voice AI to the Claude apps. Another entry in the week's theme of every major lab giving its models a voice.

Anthropic X announcement ↗

🎙️ Hear our coverage →

OpenAI May 29, 2025

Major Features & Updates

Advanced Voice Mode

OpenAI's Advanced Voice Mode can now sing

OpenAI updated ChatGPT's Advanced Voice Mode with new capabilities, including the ability to sing. Part of a week where voice interfaces kept converging on more natural, expressive interaction.

🎙️ Hear our coverage →

OpenAI May 15, 2025

Major Features & Updates

GPT-4.1 in ChatGPT

OpenAI brings the previously API-only GPT-4.1 models into ChatGPT

OpenAI's GPT-4.1 series, previously available only via the API, is now selectable in the ChatGPT interface. The crew used the news to dig into model-picker UX: seven model options in the dropdown, each with its own quirks, speed, and context length, while most casual users don't even know the dropdown exists.

🎙️ Hear our coverage →

#frontier-models #consumer-ai

Anthropic May 1, 2025

Major Features & Updates

Claude Integrations (MCP)

Claude.ai gets Integrations: remote MCP tool support for apps

Breaking during the show: Anthropic announced Integrations, letting Claude connect directly to apps like Asana, Intercom, Linear, Zapier, Stripe, Atlassian, Cloudflare and PayPal via MCP. Developers can build their own integrations quickly, bringing tool use to Claude.ai itself rather than just the API.

Anthropic announcement (X) ↗

🎙️ Hear our coverage →

Google May 1, 2025

Major Features & Updates

NotebookLM Audio Overviews

NotebookLM AI Audio Overviews go multilingual with 50+ languages

Google expanded NotebookLM's AI audio overviews (the podcast-style summaries) to support more than 50 languages, taking the feature global beyond its English-only debut.

Google announcement (X) ↗

🎙️ Hear our coverage →

#voice-ai #multilingual

OpenAI May 1, 2025

Major Features & Updates

ChatGPT Shopping

ChatGPT adds shopping capabilities

OpenAI rolled out shopping features in ChatGPT, letting the assistant find and recommend products for users. Mentioned briefly in the big-companies roundup amid the week's OpenAI sycophancy drama.

🎙️ Hear our coverage →

#consumer-ai #search

Runway May 1, 2025

Major Features & Updates

Gen-4 References

Runway References brings character and scene consistency to Gen-4

Runway launched References for Gen-4 on all paid plans, letting creators supply reference images (characters, outfits, locations, even selfies) and use tags in prompts to keep those elements consistent across generations. It tackles AI video's biggest pain point, frame-to-frame identity drift, at no extra credit cost per run.

Runway References examples (X search) ↗

🎙️ Hear our coverage →

#video-gen #image-gen

🔌 APIs & Platforms 4

Mistral AI May 29, 2025

APIs & Platforms

Mistral Agents API

Mistral launches Agents API for building tool-using agents

Mistral released an Agents API, a framework for building custom tool-using agents on top of Mistral models. It joins the wave of big-lab agent frameworks, letting developers wire up tools and orchestrate agentic workflows through Mistral's platform.

Blog ↗Tweet ↗

🎙️ Hear our coverage →

Mistral AI May 29, 2025

APIs & Platforms

Mistral Embed

Mistral ships new state-of-the-art embedding API

Mistral announced a new state-of-the-art embedding API. The release gives developers a SOTA option for retrieval and semantic search workloads served through Mistral's platform.

X announcement ↗

🎙️ Hear our coverage →

Anthropic May 15, 2025

APIs & Platforms

Web Search API

Anthropic launches Web Search API for real-time retrieval in Claude

Anthropic released a Web Search API that gives Claude models real-time web retrieval, letting developers ground responses in current information directly through the API. It was covered among the week's big-company API updates.

🎙️ Hear our coverage →

#api #search #agents

Meta AI May 1, 2025

APIs & Platforms

Llama API

Meta announces the Llama API at LlamaCon, powered by Groq

At LlamaCon, Meta unveiled an official Llama API for developers, with fast inference powered by Groq hardware. Zuckerberg also confirmed Llama thinking models are coming, along with a new meta.ai app with a social feed and a full-duplex voice model in the works.

AI at Meta LlamaCon announcements (X) ↗

🎙️ Hear our coverage →

#api #infrastructure

📄 Papers & Research 3

UC Berkeley May 29, 2025

Papers & Research

Intuitor (Learning to Reason Without External Rewards)

Paper: models can learn to reason without external rewards

A mind-bending paper showing that reinforcement learning with internal or even random rewards can improve reasoning models. Intuitor matched or exceeded some GRPO results (the external-reward framework DeepSeek popularized with R1) when finetuning Qwen2.5 3B, questioning how much of RL's gains come from the reward signal itself.

3B Qwen2.5 model size where Intuitor matched or exceeded GRPO results

X announcement ↗

🎙️ Hear our coverage →

#reasoning #training #research

MiniMax (Hailuo) May 15, 2025

Papers & Research

MiniMax Speech

MiniMax Speech tech report published, called the best TTS out there

MiniMax (Hailuo) published the technical report for MiniMax Speech, its text-to-speech system, which the show described as the best TTS out there. The report details the architecture behind the system on arXiv.

🎙️ Hear our coverage →

Cohere May 1, 2025

Papers & Research

The Leaderboard Illusion

Cohere Labs paper accuses Chatbot Arena (LMArena) of structural bias

Cohere Labs published 'The Leaderboard Illusion,' claiming LMArena lets big incumbents privately A/B-test dozens of model variants (Meta ran 27 hidden Llama-4 variants in a month), cherry-pick top scores, and receive far more battle data, inflating Elo ratings. LMArena responded that the leaderboard reflects real human preferences and pre-release testing is open to all providers.

Paper (ArXiv) ↗LMArena reply (X) ↗

🎙️ Hear our coverage →

📦 Datasets 1

UC Berkeley May 1, 2025

DatasetsOpen weights

PromptEvals

PromptEvals: 12K+ real production assertion criteria for LLM evals

Shreya Shankar and collaborators released PromptEvals, the first large-scale corpus of production LLM guardrails: 2,087 developer prompts paired with 12,623 assertion criteria covering structure, style, grounding and hallucination checks, about 5x larger than prior sets. Fine-tuned open Mistral-7B and Llama-3-8B checkpoints generate assertions +21 F1 better than GPT-4o at a fraction of the latency. Accepted to NAACL 2025.

NAACL paper (ArXiv) ↗Dataset (Hugging Face) ↗Models (Hugging Face) ↗

🎙️ Hear our coverage →

#benchmarks #training #coding

📊 Benchmarks & Evals 1

OpenAI May 15, 2025

Benchmarks & EvalsOpen weights

HealthBench

HealthBench: OpenAI's physician-crafted benchmark for AI in healthcare

OpenAI released HealthBench, a benchmark for evaluating AI models on healthcare scenarios, built with input from physicians. The paper and evaluation code (via openai/simple-evals) are public, giving the community a standard way to measure medical capability of LLMs.

Blog ↗Paper ↗Code (simple-evals) ↗

🎙️ Hear our coverage →

#benchmarks #research

← April 2025 All months July 2025 →