Everything AI Released in January 2025

27 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

🧠 New Models 12

Alibaba (Qwen)
New Models

Qwen2.5-Max

Alibaba launches Qwen2.5-Max flagship model with hidden video gen

Alibaba's Qwen team released Qwen2.5-Max, a large MoE flagship model available through the Qwen Chat interface and API, claiming competitive results against DeepSeek V3 and other frontier models. The chat app also quietly shipped a video generation capability powered by Alibaba's Tongyi Wanxiang.

Alibaba (Qwen)
New ModelsOpen weights

Qwen2.5-VL

Alibaba ships Qwen2.5-VL open vision-language model family

Alibaba's Qwen team released Qwen2.5-VL, open-weights vision-language models up to 72B that handle images, documents, video understanding, and on-screen agentic grounding. The 72B Instruct model was immediately available on Hugging Face and in Qwen Chat.

72B Largest variant
New ModelsOpen weights

YuE 7B

YuE 7B: open-source Suno-style music generation model

The Multimodal Art Projection (M-A-P) team released YuE, a 7B open-source music generation model dubbed the 'open Suno' on the show, capable of generating full songs with vocals from lyrics. Weights are on Hugging Face with code on GitHub and a hosted demo on fal.ai.

7B Parameters
Mistral AI
New ModelsOpen weights

Mistral Small 2501

Mistral Small 2501: 24B open-weights model under Apache 2.0

Mistral AI released Mistral Small 2501, a 24B-parameter instruct model under the permissive Apache 2.0 license. Announced as breaking news during the show, it continues Mistral's tradition of strong small open models suitable for fine-tuning and local deployment.

24B Parameters
DeepSeek
New ModelsOpen weights

DeepSeek R1

DeepSeek R1: MIT-licensed open source reasoning model rivals o1

DeepSeek released R1, a state-of-the-art open source reasoning model under a permissive MIT license. It matches or beats OpenAI's o1 on key reasoning benchmarks while being fully open weights, and DeepSeek also shipped a family of distilled smaller models. The show called this the hottest week open source AI has ever had.

Google DeepMind
New Models

Gemini 2.0 Flash Thinking 01-21

Google ships updated Gemini Flash Thinking with 1M context

Google released an updated Gemini Flash Thinking model (01-21) with a 1 million token context window, built-in code execution, and improved evals over the previous Thinking release. It pushes Google's reasoning-model line forward in the same week DeepSeek R1 landed.

1M Context window (tokens)
Hugging Face
New ModelsOpen weights

SmolVLM (256M)

Hugging Face SmolVLM: tiny vision-language models run on WebGPU

Hugging Face released SmolVLM, a family of tiny vision-language models including a 256M-parameter version small enough to run entirely in the browser via WebGPU. It demonstrates how far efficient multimodal models have shrunk while remaining usable.

256M Parameters (smallest VLM)

🚀 Products & Apps 2

Riffusion
Products & Apps

Fuzz

Riffusion launches Fuzz music generation, free for now

Riffusion (written as 'Refusion' in the show notes) launched Fuzz, a hosted AI music generation product that is free to use during its initial period. It was highlighted in the voice and audio segment alongside YuE as part of a wave of new AI music tools.

OpenAI
Products & Apps

Operator

OpenAI launches Operator, an agentic browser for ChatGPT Pro

OpenAI launched Operator, an agentic browser-use product that performs tasks for you on the web, available to ChatGPT Pro subscribers at operator.chatgpt.com. As Sam Altman framed it on the launch stream: you give agents a task and they go off and do it.

✨ Major Features & Updates 2

Exa
Major Features & Updates

Exa DeepSeek Chat

Exa ships free DeepSeek R1 chat demo with web search

Exa integrated DeepSeek R1 into a free hosted chat demo that combines the reasoning model with Exa's web search. Mentioned in the tools section as a no-cost way to try R1 grounded with live search results.

Perplexity
Major Features & Updates

Perplexity Pro with R1

Perplexity adds DeepSeek R1 as a Pro reasoning model option

Perplexity integrated DeepSeek R1 into its Pro search product, letting subscribers choose R1 as the reasoning model behind answers. It was one of several tools that raced to host R1 on Western infrastructure within days of the model's release.

🔌 APIs & Platforms 2

Anthropic
APIs & Platforms

Citations (Claude API)

Anthropic adds Citations to the Claude API

Anthropic launched a Citations capability in the Claude API, letting Claude ground its answers in provided source documents and return precise citations. It targets RAG and document-QA use cases where verifiable sourcing matters.

Perplexity
APIs & Platforms

Sonar Pro Search API

Perplexity ships Sonar Pro search API and an Android AI assistant

Perplexity released its Sonar Pro search-grounded API, giving developers programmatic access to Perplexity-style web-grounded answers, and also launched an AI assistant for Android. Two shipping moves that push Perplexity beyond its consumer answer engine.

🛠️ Dev Tools 4

Browser Use
Dev ToolsOpen weights

Browser-use

Browser-use: open-source alternative to OpenAI's Operator

Browser-use is an open-source library that lets LLM agents control a real web browser, positioned on the show as the OSS counterpart to OpenAI's Operator. It enables anyone to build browsing agents with their model of choice instead of a closed hosted product.

ByteDance
Dev Tools

Trae

ByteDance launches Trae, an AI IDE competing with Cursor

ByteDance launched Trae, an AI-powered code editor positioned as a Cursor competitor. It is ByteDance's second shipping move of the week alongside the UI-TARS computer-use models.

Pietro Schirano
Dev ToolsOpen weights

RAT (Retrieval Augmented Thinking)

RAT: pipe DeepSeek R1 reasoning into other models

Guest Pietro Schirano released RAT (Retrieval Augmented Thinking), a technique and tool that extracts DeepSeek R1's reasoning traces and feeds them to a cheaper, faster model like GPT-3.5 Turbo for the final answer. It showcases the new pattern of mixing open reasoning traces with closed completion models.

📄 Papers & Research 1

UC Berkeley
Papers & ResearchOpen weights

TinyZero & RAGEN

Berkeley TinyZero and RAGEN replicate DeepSeek R1-Zero

Berkeley researchers released TinyZero and RAGEN, open replications of DeepSeek's R1-Zero reinforcement-learning recipe on small models. The projects showed that R1-style emergent reasoning behavior can be reproduced cheaply, with training runs logged publicly on Weights & Biases.

📦 Datasets 1

📊 Benchmarks & Evals 1

Benchmarks & Evals

Humanity's Last Exam (HLE)

Humanity's Last Exam: a deliberately unsaturated frontier benchmark

Humanity's Last Exam (HLE) launched as a new, very hard benchmark designed to stay unsaturated as models max out MMLU and math evals. It crowdsourced expert-level questions to measure frontier model capability where existing benchmarks are at 98-99% saturation.

💰 Funding 1

Funding

Stargate Project

Stargate Project: $500B AI infrastructure investment announced

OpenAI, SoftBank (Masayoshi Son's Vision Fund), and Oracle (Larry Ellison) announced the Stargate Project, a planned $500 billion investment in US AI infrastructure. The announcement, made alongside the White House, was framed on the show as an AI 'Manhattan Project'-scale buildout of datacenters and compute.

$500B Planned investment

🌀 Also Released 1