Open Source & Open Weights

Open-weight model releases, open datasets, and the open-source AI ecosystem. — 273 releases covered on the show.

June 2026

Google DeepMind
New ModelsOpen weights

Gemma 4 12B

Google drops Gemma 4 12B, an encoder-free multimodal local model

Google released Gemma 4 12B, an encoder-free multimodal model under Apache 2.0 that targets 16GB VRAM local setups. Instead of bolting separate vision or audio encoders onto a language model, it uses one unified network, which LDJ and Yam argued makes smaller multimodal models cheaper, cleaner, and easier to run locally.

H Company
New ModelsOpen weights

Holo 3.1

H Company launches Holo 3.1 local computer-use agent models

H Company released Holo 3.1, a family of local computer-use agent models ranging from 0.8B to 35B parameters with new quantized checkpoints. The lineup targets running screen-driving agents on local hardware rather than in the cloud.

Ideogram
New ModelsOpen weights

Ideogram 4.0

Ideogram 4.0 becomes the top open-weight text-to-image model

Ideogram released Ideogram 4.0, a 9.3B-parameter text-to-image model with open weights under a non-commercial license. It leads open-weight image models on typography and layout, with bounding-box/layout-style prompting that trades casual generation ease for precise structured control.

9.3B Ideogram 4 parameters
JetBrains
New ModelsOpen weights

Mellum 2

JetBrains open-sources Mellum 2, a 12B MoE coding model

JetBrains released Mellum 2, a 12B mixture-of-experts coding model with only 2.5B active parameters, trained from scratch by a small team using a three-stage curriculum over 10T tokens. The panel read it as IDE companies converting years of developer-workflow context into model advantage; it is also available on CoreWeave Inference.

Nous Research
Products & Apps

Hermes Desktop

Nous Research launches Hermes Desktop agent app for Mac/Win/Linux

Nous Research launched Hermes Desktop, packaging the Hermes Agent harness into a native desktop app for Mac, Windows, and Linux. Karan previewed chat, permissions, tool-call visibility, reasoning traces, and admin controls aimed at small teams, startups, and personal agent fleets.

NVIDIA
New ModelsOpen weights

Nemotron 3.5 ASR

NVIDIA ships Nemotron 3.5 ASR, a 600M streaming speech model

NVIDIA released Nemotron 3.5 ASR, a 600M-parameter open multilingual streaming speech-to-text model aimed at voice agents. It supports 40 languages and reportedly delivers 17x more throughput than Parakeet-style baselines at half the size, pushing the latency/accuracy frontier for open voice-agent infrastructure.

17x Nemotron ASR throughput
NVIDIA
New ModelsOpen weights

Nemotron 3 Ultra

NVIDIA releases Nemotron 3 Ultra, a 550B open-weight MoE for agents

NVIDIA dropped Nemotron 3 Ultra the day of the show, a 550B-parameter sparse MoE with 55B active parameters built for long-running agentic harnesses like OpenCode, Hermes, and OpenClaw. Chris Alexiuk joined to explain the hybrid Mamba/Transformer architecture and the unusually complete open release: weights, training data, recipes, a GenRM reward model, and an NVFP4 quantized checkpoint.

550B Nemotron 3 Ultra parameters55B Active parameters

May 2026

OpenBMB
New ModelsOpen weights

MiniCPM5-1B

OpenBMB MiniCPM5-1B: new SOTA 1B open-weights model

OpenBMB released MiniCPM5-1B, a state-of-the-art 1B-parameter open-weights model for efficient local and on-device use that runs on a phone. It scores 17.9 on the Artificial Analysis Intelligence Index, 7.4 points ahead of its size class, while using roughly 31x fewer output tokens than Qwen3.5 2B.

17.9 AAII (1B model)
PrismML
New ModelsOpen weights

Bonsai Image 4B

PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB

PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.

Tencent
New ModelsOpen weights

Hy-MT2

Tencent open-sources Hy-MT2 translation models under Apache 2.0

Tencent released the Hy-MT2 family of translation models under Apache 2.0, including a tiny 1.8B model that beats paid translation APIs like Microsoft's Translator, plus a larger 30B-A3B MoE variant. A small, free, locally-runnable model outperforming commercial translation services was one of the open-source wins of the week.

Cohere
New ModelsOpen weights

Command A+

Cohere releases Command A+, a 218B Apache 2.0 MoE with 25B active params

Cohere released Command A+, a 218B-parameter mixture-of-experts model with 25B active parameters, shipping open weights under Apache 2.0. It was the week's headline open-source release, available on Hugging Face in both W4A4 quantized and BF16 variants.

218B Command A+ parameters25B active parameters
Fastino Labs
New ModelsOpen weights

GLiGuard

Fastino Labs GLiGuard: 300M open guardrail model matches SOTA safety models

Fastino Labs released GLiGuard, a 300M-parameter open source guardrail model that matches state-of-the-art safety models 23-90x its size while delivering 16x higher throughput. It ships under Apache 2.0, making small, fast, deployable guardrails available to everyone.

300M parameters

April 2026

DeepSeek
New ModelsOpen weights

DeepSeek V4

DeepSeek V4: 1.6T MoE with CSA+HCA attention and 1M context

DeepSeek released the V4 paper and models (V4-Pro and V4-Flash on Hugging Face), a 1.6T-parameter MoE featuring CSA+HCA attention that fits 1M tokens of context in just 5.7GB of KV cache. It is possibly the first frontier model trained across multiple datacenters, and DeepSeek is offering API tokens at an 80% discount on already much cheaper pricing.

1M context window5.7GB KV cache at 1M context
IBM
New ModelsOpen weights

Granite 4.1

IBM Granite 4.1: dense non-thinking models with top tool calling

IBM released the Granite 4.1 family (3B/8B/30B), dense non-thinking models under Apache 2.0 with best-in-class tool calling, scoring 73 on BFCL with just 8B parameters. IBM claims 20x token efficiency over Qwen3.5 9B, and the models are live on W&B Inference at $0.05/$0.10 per million input/output tokens with 128K context.

SenseTime
New ModelsOpen weights

SenseNova U1

SenseTime open-sources SenseNova U1 unified multimodal MoE

SenseTime open-sourced SenseNova U1, a unified multimodal MoE model with 8B total and 3B active parameters that handles understanding and generation with no separate encoder or VAE. The architecture builds on a paper the team presented at ICLR last year.

8B total parameters (3B active MoE)
Alibaba (Qwen)
New ModelsOpen weights

Qwen3.6-27B

Qwen3.6-27B: dense Apache-2.0 model beats Alibaba's own 400B flagship

Alibaba shipped Qwen3.6-27B, a dense 27B-parameter model under Apache 2.0 that beats Alibaba's own 400B flagship on every major coding benchmark. Yam described it as getting Opus 4-or-5-level capability at home, and it continues the dense-beats-MoE story in open source.

27B dense Qwen3.6
Brex
Dev ToolsOpen weights

CrabTrap

Brex open-sources CrabTrap, an LLM-as-judge proxy for agent security

Brex's CEO pair-programmed with Codex and open-sourced CrabTrap, an LLM-as-judge HTTP proxy that intercepts outbound agent requests and blocks risky activity using natural-language rule definitions. Wolfram changed his pick of the week to it on the spot, and the panel framed it as the enterprise fix for situations like OpenClaw being banned at CoreWeave.

Moonshot AI
New ModelsOpen weights

Kimi K2.6

Kimi K2.6: 1T MoE open-source SOTA on SWE-Bench Pro

Moonshot AI released Kimi K2.6, a 1-trillion-parameter MoE with 32B active parameters, 384 experts, MLA attention, and a 256K context window under a modified MIT license. It claims open-source state of the art on SWE-Bench Pro at 58.6, and Wolfram called it the best open-source model he has ever tested on his private wolf-bench.

1T MoE Kimi K2.6
OpenAI
New ModelsOpen weights

Privacy Filter

OpenAI open-sources a 1.5B privacy/PII filter that runs in the browser

OpenAI open-sourced a tiny 1.5B MoE model with only 50M active parameters under Apache 2.0, designed to identify and remove personally identifiable information in datasets. It runs fully in the browser on WebGPU via Xenova's Transformers.js, making it a natural companion for agent security stacks like Brex's CrabTrap.

Alibaba (Qwen)
New ModelsOpen weights

Qwen 3.6-35B-A3B

Qwen 3.6-35B-A3B: Apache 2.0 MoE with 3B active hits 73.4% SWE-Verified

Alibaba Qwen open-sourced Qwen 3.6-35B-A3B under Apache 2.0 the same morning Opus 4.7 dropped: a 35B MoE with only 3B active parameters that scores 73.4% on SWE-bench Verified, rivaling models 10x its size. It is natively multimodal with 262K context extensible to 1M, and the crew called it the strongest mid-size LLM on nearly all benchmarks, putting to rest doubts about Qwen's open-source commitment after Junyang Ling's departure.

73.4% SWE-bench Verified
Daily (Pipecat)
Products & AppsOpen weights

Gradient Bang

Gradient Bang: first massively multiplayer fully LLM-driven voice game

Kwindla Kramer's 'side project that broke containment' is a fully LLM-driven multiplayer voice-based space game inspired by BBS-era Trade Wars, built on a new Pipecat Sub-Agents library with a class-based event bus that works locally and over the network. A Deepgram plus GPT-4.1 voice agent always responds in under 1.5 seconds while GPT-5.2 medium-thinking task agents do the work, and the React frontend is rendered from LLM-generated JSON as dynamic UI. The team also open-sourced GB Benchmarks for evaluating agent task execution.

Jiunsong (@songjunkr)
New ModelsOpen weights

Super Gemma 4 26B Uncensored v2

Super Gemma 4 26B Uncensored v2 trends on HF with 0/100 refusals

Community fine-tuner @songjunkr released Super Gemma 4 26B Uncensored v2, which is trending on Hugging Face with 0/100 refusals and fixed tool calling. It ships in GGUF and MLX 4-bit variants for local inference.

Marimo
Dev ToolsOpen weights

Marimo Pair

Marimo Pair drops coding agents inside reactive Python notebooks

Marimo released Marimo Pair, which embeds Claude Code, Codex, or OpenCode agents directly inside its reactive, dependency-graph-aware Python notebooks. Founding engineer Trevor Manz joined the show to explain why reactive notebooks are a natural verification surface for agent-written code; the launch trended on Hacker News this week and was featured as part of This Week's Buzz (Marimo is in the CoreWeave family).

Tencent
New ModelsOpen weights

HYWorld 2.0

Tencent HYWorld 2.0 turns a single image into editable 3D scenes

Tencent released HYWorld 2.0, which converts a single image into editable 3D Gaussian Splats and meshes that are ready for Unity, Unreal, and Isaac Sim. It is one of three single-image-to-3D-world releases this week, essentially an open-source equivalent of what Fei-Fei Li's World Labs is building.

DatasetsOpen weights

Arena historical leaderboard & prompt datasets

Arena releases 3 years of leaderboard data and prompts on Hugging Face

Arena (formerly LMArena) released three years of historical leaderboard data plus the actual user prompts as datasets on Hugging Face. Peter Gostev, who previously scraped the site by hand into Google Sheets for his charts, now builds his Compute Wars and model-trend analyses straight from the data.

Dev ToolsOpen weights

MemPalace

MemPalace open-source AI memory system goes viral with 26K stars

MemPalace, the open-source AI memory system from Milla Jovovich and Ben Sigman, went viral with 26K GitHub stars in 2 days and claimed top memory-benchmark scores. The team then transparently walked back the overstated benchmark claims in a public correction thread, which the show called a refreshingly honest arc.

Nous Research
New ModelsOpen weights

Hermes 27B

Nous Research ships Hermes 27B, paired with the Hermes harness

Nisten's pick of the week: Hermes 27B, an open model trained specifically to be paired with the Hermes harness and allegedly distilled from the Opus API. Model and harness ship together as a portable unit, a notable take on the harness-engineering trend Swyx discussed.

OpenClaw
Dev ToolsOpen weights

OpenClaw 2026.4.5

OpenClaw 2026.4.5 ships /dreaming memory consolidation

OpenClaw's biggest release since 4.0: /dreaming goes GA with Light/Deep/REM memory consolidation phases that defrag agent memory into a human-readable Dream Diary (DREAMS.md). The release also adds built-in video and music generation across 4 backends, GPT-5.4 as the new default model, prompt-cache reuse improvements, and Control UI plus docs in 12 new languages. Maintainer Vincent Koc says the ~1.5M-line codebase was refactored into a plugin architecture in nine days.

1.5M lines OpenClaw codebase
Z.ai (Zhipu AI)
New ModelsOpen weights

GLM-5.1

GLM-5.1 takes #1 open-source spot on SWE-Bench Pro at 58.4%

Z.ai released GLM-5.1, now the #1 open-source model on SWE-Bench Pro at 58.4%. It can run autonomously for 8 hours with 1,700+ agent steps, and is already live on W&B Inference. Open weights are up on Hugging Face alongside an arXiv paper.

Google DeepMind
New ModelsOpen weights

Gemma 4

Google releases Gemma 4 open-weights family under Apache 2.0

Google DeepMind's Gemma 4 launch crossed 10M+ downloads with over 1,000 Gemma-4-based fine-tunes on Hugging Face; the Gemma family totals 500M+ downloads. Omar Sanseviero says Gemma is the foundation for the next generation of Gemini Nano shipping on Pixel and Samsung, with the AI Edge gallery letting people run it locally on Android and iOS. It punched above its size on Arena's Pareto curve and is now live on W&B Inference.

PrismML
New ModelsOpen weights

Bonsai

PrismML releases Bonsai 1-bit models, an 8B model in 1.15 GB

PrismML released Bonsai, a family of 1-bit quantized open models fitting an 8B model into 1.15 GB and claiming 10x intelligence density, built on decades of compression research. The panel discussed one-bit quantization as a cost/performance lever for cheap local inference.

Dev ToolsOpen weights

claw-code

Claw-code clean-room rewrite becomes fastest repo to 100K GitHub stars

After Claude Code's source leaked via npm, Sigrid Jin and Bellman published claw-code, a clean-room rewrite that became the fastest GitHub repo to pass 100K stars, hitting the mark in roughly 24 hours. Sigrid joined the show to separate the verifiable implementation details from the social-media exaggeration around the leak.

100K+ GitHub stars in 24h

March 2026

Cohere
New ModelsOpen weights

Cohere Transcribe

Cohere Transcribe: open-source 2B ASR tops Open ASR Leaderboard at 5.42% WER

Cohere entered the ASR game with Transcribe, a 2-billion-parameter Apache 2.0 speech recognition model that immediately took the number-one spot on Hugging Face's Open ASR Leaderboard with a 5.42% word error rate versus Whisper Large v3's 7.44%. It wins 61% of human evaluations on average and 64% head-to-head against Whisper, making it a credible local-inference Whisper replacement for regulated industries.

2B Cohere Transcribe ASR size5.42% Word error rate on Open ASR Leaderboard
MiniMax
New ModelsOpen weights

MiniMax 2.7

MiniMax 2.7 open-source weights discussed as small-model momentum continues

The panel covered MiniMax 2.7 and its open-weights release in the context of small, efficient models becoming genuinely practical for local and specialized agent workflows. The segment focused on capability momentum and how open-weights expectations keep shaping adoption sentiment.

Mistral AI
New ModelsOpen weights

Voxtral TTS

Mistral drops Voxtral TTS, a 3B open-weight text-to-speech model

Mistral released Voxtral TTS, its first text-to-speech model, as breaking news during the live show: 3 billion parameters, open weights, with emotion controls for neutral, happy, and frustrated voices. Mistral claims it beats ElevenLabs Flash v2.5 in human preference tests with a 58% win rate on flagship voices and 68% on zero-shot voice cloning, though Alex's live test found it decent rather than stunning.

3B Mistral Voxtral TTS size
Reka AI
New ModelsOpen weights

Reka Edge

Reka AI ships Edge, a 7B multimodal VLM for sub-second on-device inference

Reka AI launched Reka Edge, a 7B-parameter multimodal vision-language model built for sub-second latency on edge devices. Weights are on Hugging Face and the model is available through OpenRouter, with the panel highlighting it as a notable efficient multimodal release for real-world deployment.

H Company
New ModelsOpen weights

Holotron-12B

H Company's Holotron-12B: hybrid SSM computer-use model at 8.9k tok/s

H Company released Holotron-12B, an open-source hybrid SSM model built for computer-use agents. It claims 8,900 tokens/sec generation speed and jumps the WebVoyager benchmark from 35.1% to 80.5%, continuing the trend of hybrid SSM architectures for long-context agent workloads.

8,900 tok/s H Company Holotron 12B
Hugging Face
Also Released

State of Open Source Spring 2026 Report

Hugging Face report: China passes US in LLM count, Qwen tops 1B downloads

Hugging Face published its Spring 2026 State of Open Source report showing China surpassing the US in number of LLMs for the first time, with Chinese models taking 41% of all downloads. Alibaba's Qwen family crossed 1 billion total downloads (about 1 million per day), overtaking Llama as the most downloaded model family, on a platform now hosting 11M users and 2M+ models.

MiniMax
New Models

MiniMax M2.7

MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro

MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.

56% MiniMax 2.7 SWE-bench Pro
Mistral AI
New ModelsOpen weights

Mistral Small 4

Mistral Small 4: 119B MoE with 6B active unifies vision, coding, reasoning

Mistral returned to open source with Small 4, a 119B-parameter MoE with 128 experts and only 6B active per token, released under Apache 2.0. It unifies the previous Pixtral (vision), Devstral (coding), and Magistral (reasoning) lines into one model and can fit on a single H100 when compressed. Early WolfBench results are sobering at ~17% on OpenClaw agent tasks, roughly on par with similarly sized Nemotron.

119B Mistral Small 4 total params
Papers & ResearchOpen weights

Mamba-3

Mamba-3 lands with three SSM innovations for inference-first linear models

Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.

Unsloth AI
Dev ToolsOpen weights

Unsloth Studio

Unsloth Studio: web UI for local fine-tuning with 2x speed, 70% less VRAM

Unsloth launched Studio, an open-source web UI for local LLM training and inference claiming 2x speed and 70% less VRAM, supporting 500+ models across text, vision, audio, and embeddings. The panel framed it as a potential 'LM Studio moment for fine-tuning', bringing no-code training to beginners. Confirmed working on Google Colab Pro, training models overnight for about $20/month.

Fish Audio
New ModelsOpen weights

Fish Audio S2

Fish Audio S2 open TTS hits sub-150ms latency

Fish Audio S2 is a fully open-source TTS model with inline emotion control via free-text bracket tags like gasp, laughter, and long pause. Alex demoed it live with an OpenClaw skill that let his 5-year-old talk to a voice clone of 'Rocky' from Project Hail Mary; Wolfram called it 'ElevenLabs V3 for free.'

<150ms Fish Audio S2 TTS latency
NVIDIA
New ModelsOpen weights

Nemotron 3 Super 120B

NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet

NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.

120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)
Paperclip
Dev ToolsOpen weights

Paperclip.ing

Paperclip.ing: open-source agent orchestration for zero-human companies

Anonymous builder DOTTA presented Paperclip.ing, an open-source agent orchestration framework for 'zero human companies' where an AI CEO recursively hires more agents. It hit 20K GitHub stars in its first week, with a heartbeat system driving agent autonomy and a Memento-style memory architecture keeping agents coherent across tasks.

20K Paperclip GitHub stars in first week
Alibaba (Qwen)
New ModelsOpen weights

Qwen3.5 Small Series

Alibaba releases Qwen3.5 small models (2B, 4B, 9B) for local use

Alibaba released the Qwen3.5 small model series with 2B, 4B, and 9B variants, which the panel found highly usable on consumer hardware. The release landed alongside leadership turbulence as Junyang Lin and Binyuan Hui departed Qwen, though the panel expects Alibaba's open-source momentum to continue.

StepFun
New ModelsOpen weights

Step 3.5 Flash Base

StepFun open-sources Step 3.5 Flash Base with its training stack

StepFun released Step 3.5 Flash Base and Midtrain checkpoints, an unusually open release that includes training artifacts and the SteptronOSS training stack alongside the weights. The panel praised the Apache-2 orientation and called the continuation-pretraining flexibility a major practical unlock for builders.

February 2026

Alibaba (Qwen)
New ModelsOpen weights

Qwen 3.5

Qwen 3.5 lands: 35B/3B-active Medium outperforms the old 235B flagship

Alibaba released the Qwen 3.5 family of open-weight models, headlined by Qwen3.5-35B-A3B, a 35B model with only 3B active parameters that outperforms their previous 235B flagship. Variants include a 122B-A10B and a dense 27B, with the panel highlighting the hybrid state-space (Mamba-layer) architecture and strong practical coding and agent performance at a tiny active-parameter footprint.

35B / 3B active Qwen 3.5 Medium
Liquid AI
New ModelsOpen weights

LFM2-24B-A2B

Liquid AI releases LFM2-24B-A2B, a laptop-friendly 24B MoE

Liquid AI released LFM2-24B-A2B, a 24B mixture-of-experts model with only 2.3B active parameters that runs on consumer laptops. The panel highlighted its speed and surprisingly strong non-coding reasoning, reinforcing the trend of efficient low-active-parameter open models for local use.

Weights & Biases
Major Features & Updates

W&B Inference: MiniMax 2.5 & Kimi K2.5

W&B Inference adds MiniMax 2.5 and Kimi K2.5

Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.

Alibaba (Qwen)
New ModelsOpen weights

Qwen3.5-397B-A17B

Alibaba opens Qwen 3.5: 397B-param multimodal MoE with only 17B active

Alibaba released Qwen3.5-397B-A17B, billed as the first open-weight native multimodal MoE model, with 397B total parameters, just 17B active, 512 experts, and 262K native context extendable to 1M. It delivers 8.6-19x faster inference than Qwen3-Max and continues Qwen's strength in multilingual and medical tasks, scoring 52.5% on Terminal Bench, third place among open-source models. Nisten found coding still trails GLM-5.

397B Qwen 3.5 Parameters
Cohere Labs
New ModelsOpen weights

Tiny Aya

Cohere Labs releases Tiny Aya, a 3.35B multilingual model for 70+ languages

Cohere Labs released Tiny Aya, a 3.35B-parameter multilingual model family supporting 70+ languages that is small enough to run locally on phones. It extends Cohere's Aya line of open multilingual models, bringing broad language coverage to on-device deployments.

Weights & Biases
Major Features & Updates

Kimi K2.5 on W&B Inference

W&B adds Kimi K2.5 to its inference service

Weights & Biases launched Kimi K2.5 on its inference service, making Moonshot AI's model available to W&B users. In Wolfram's Terminal Bench deep dive for W&B, Kimi K2.5 achieved a 67.4% ceiling score across multiple runs, among the strongest open-model results he measured.

Zyphra
New ModelsOpen weights

ZUNA

Zyphra opens ZUNA, a 380M-param EEG brain-computer interface model

Zyphra released ZUNA, a 380M-parameter open-source BCI foundation model that translates EEG brain signals into text, reconstructing clinical-grade brain signals from sparse, noisy data. Dubbed 'thought to text' by the community, it works with roughly $500 non-invasive EEG headsets, likely needs personalized training per user, and is small enough to run in real time on a consumer gaming GPU. It is Apache licensed.

MiniMax
New ModelsOpen weights

MiniMax M-2.5

MiniMax M-2.5 hits 80.2% SWE-Bench Verified with 10B active params

MiniMax dropped M-2.5 thirty minutes before the show: a 200B-total, 10B-active open-weights model scoring 80.2% on SWE-Bench Verified, approaching Opus 4.6 at roughly 1/20th the cost (~15 cents per task with a 57% win rate over Opus). Trained with MiniMax's decoupled Forge RL framework and optimized for end-to-end task time with fewer tool calls and thinking tokens. Senior researcher Olive Song joined live and revealed the model was still training — they cut a checkpoint for early release.

80.2% SWE-Bench Verified15¢ Cost per task
Zhipu AI (Z.ai)
New ModelsOpen weights

GLM-5

Z.ai launches GLM-5, the open-weights agentic coding crown

Z.ai released GLM-5, a 744B-parameter MoE model (40B active) trained on 28.5 trillion tokens that takes the #1 open-source ranking for agentic coding with 77.8% SWE-bench Verified. It introduces the SLIM asynchronous RL framework for post-training, adopts DeepSeek's sparse attention to cut deployment cost, and was trained on Huawei chips rather than NVIDIA. Lou from Z.ai joined the show live and summed it up as bigger, faster, better, and cheaper.

744B GLM-5 Parameters28.5T Training tokens
Alibaba (Qwen)
New ModelsOpen weights

Qwen3-Coder-Next

Qwen3-Coder-Next hits 70.6% SWE-Bench Verified with 3B active params

Alibaba's Qwen3-Coder-Next is an 80B MoE coding agent model with only 3B active parameters that scores 70.6% on SWE-Bench Verified and 44% on the much harder SWE-Bench Pro. It was trained on 7.5T tokens with 20,000 parallel RL environments and runs under 48GB of RAM with GGUF quantization, making near-frontier agentic coding feasible on local hardware.

70.6% SWE-Bench Verified44% SWE-Bench Pro
New ModelsOpen weights

Intern-S1-Pro

Intern-S1-Pro: 1 trillion parameter open MoE for scientific reasoning

InternLM released Intern-S1-Pro, a 1 trillion parameter open-source MoE model targeting SOTA scientific reasoning across chemistry, biology, materials, and earth sciences. The panel noted it beats frontier models on science benchmarks, a massive compute investment for an open release.

Mistral AI
New ModelsOpen weights

Voxtral Transcribe 2

Mistral's Voxtral Transcribe 2 dethrones Whisper as SOTA transcription

Mistral AI launched Voxtral Transcribe 2, state-of-the-art speech-to-text with sub-200ms latency, native diarization support, and open weights under Apache 2.0. The panel called it the first model to dethrone Whisper after roughly three years, and Alex used it to transcribe this very episode.

StepFun
New ModelsOpen weights

Step 3.5 Flash

StepFun Step 3.5 Flash: frontier reasoning claims at 11B active params

StepFun released Step 3.5 Flash, a 196B sparse MoE model with only 11B active parameters, claiming frontier-level reasoning while generating at 100-350 tokens per second. It continues the trend of sparse Chinese MoE models delivering high speed at low active parameter counts.

January 2026

Arcee AI
New ModelsOpen weights

Trinity Large

Arcee AI ships Trinity Large: 400B MOE trained in 33 days for $20M

Arcee AI's Trinity Large is a 400B-parameter MOE with 13B active parameters, trained on 17T tokens across 2000 B300 GPUs in 33 days for $20M. It has 512K native context (twice Kimi K2.5), is free on OpenRouter until February 2026, and the panel called it the largest Western open-source lab model.

400B Arcee Trinity Large512K Trinity native context
Alibaba (Qwen)
New ModelsOpen weights

Qwen3-TTS

Qwen3-TTS: open-source TTS family with 97ms latency and voice cloning

Alibaba's Qwen team released Qwen3-TTS, a full open-source text-to-speech family under Apache 2 that dropped 30 minutes before the show. It spans 5 models from 0.6B to 1.7B parameters, with 97ms latency, voice cloning from just 3 seconds of audio, voice description prompting, and 10-language support.

97ms Latency
FlashLabs
New ModelsOpen weights

Chroma 1.0

FlashLabs Chroma 1.0: open-source real-time speech-to-speech under 150ms

FlashLabs released Chroma 1.0, billed as the world's first open-source end-to-end real-time speech-to-speech model with voice cloning under 150ms latency. The 4B parameter model is built on Qwen 2.5 Omni and released under Apache 2; its live demo with RAG and document upload impressed the whole panel.

Liquid AI
New ModelsOpen weights

LFM2.5-1.2B-Thinking

Liquid AI's LFM2.5-1.2B-Thinking: on-device reasoning under 900MB

Liquid AI released LFM2.5-1.2B-Thinking, a 1.2B parameter reasoning model that runs entirely on-device with under 900MB of memory. Its hybrid architecture with gated convolutions delivers 239 tokens/sec on an AMD CPU and 82 tokens/sec on a mobile NPU, making it practical for edge devices, Raspberry Pi, and older iPhones.

1.2B Parameters, under 900MB memory
Peter Steinberger
Dev ToolsOpen weights

Clawdbot

Clawdbot: open-source self-improving personal AI assistant for macOS

Clawdbot, created by Peter Steinberger, is an open-source personal AI assistant that runs locally on your Mac and connects via WhatsApp, Telegram, or Discord. Its killer feature is self-improvement: ask it to learn something and it writes its own skill files, giving a single chat conversation control over multiple agents, persistent memory, voice messages, image generation, and browser automation on your actual computer.

Z.AI (Zhipu)
New ModelsOpen weights

GLM-4.7-Flash

GLM-4.7-Flash: 30B MoE local coding agent with only 3B active params

Z.AI released GLM-4.7-Flash, a 30B parameter MoE model with only 3B active parameters, designed as the ultimate local coding and agent assistant. It hits 59% on SWE-Bench Verified (approaching Sonnet 4's 64%) and runs at 120 tokens/sec on a stock Mac Studio M3 Ultra, fast enough to run RALF autonomous coding loops even on CPU.

59% SWE-Bench Verified120 tps Speed on Mac Studio M3 Ultra
Black Forest Labs
New ModelsOpen weights

Flux 2 Klein

Black Forest Labs drops Flux 2 Klein, fast open-weights image model

Wolfram broke the news mid-show: Black Forest Labs released Flux 2 Klein, a fast 4B/9B image generation model with open weights under Apache 2.0. It is designed for near-real-time editing and style iteration, and Alex used it minutes later in his live Claude Cowork demo.

Byte
New ModelsOpen weights

M3

M3: 235B open-source medical LLM claims to beat GPT 5.2 on HealthBench

Byte released M3, a 235B parameter medical LLM fine-tuned from Qwen3 and licensed Apache 2.0. With only 22B active parameters, it is runnable at usable speeds on an M3 Ultra, and it claims to beat GPT 5.2 on HealthBench. Nisten suggested pairing it with smaller imaging models like MedGemma rather than treating them as substitutes.

235B M3 Medical LLM
Chorus
Major Features & UpdatesOpen weights

Chorus Skills Support

Chorus adds agent skills support for every LLM via OpenRouter

Alex used a Ralph loop with Claude Code to add full agent skills support to Chorus, the open-source app that compares answers across multiple LLMs, in about 3.5 hours. The work added a settings panel, filesystem skill discovery, front-matter parsing, and cross-model skill injection, letting the same Claude-style skills run on GPT 5.2 Codex, Gemini, and any OpenRouter model.

Google DeepMind
New ModelsOpen weights

MedGemma 1.5

Google releases MedGemma 1.5 for offline medical imaging

Google released MedGemma 1.5, a small (4B-class) open model for medical use cases, compact enough to run offline for medical imaging. The panel stressed it is a different model class from Byte's giant M3 medical LLM and that the two pair well together rather than replacing each other.

Meituan (LongCat)
New ModelsOpen weights

LongCat Flash Thinking

Meituan's LongCat Flash Thinking: 560B MoE with 27B active, MIT licensed

Meituan released LongCat Flash Thinking, an open-source reasoning MoE with 560B total parameters and only 27B active, under an MIT license. It continued the run of large sparse Chinese open-weights models offering frontier-style reasoning at low active-parameter cost.

560B/27B LongCat Flash
Liquid AI
New ModelsOpen weights

LFM 2.5

Liquid AI LFM 2.5: 1B on-device family with end-to-end audio

Liquid AI released LFM 2.5, a family of ~1.2B parameter on-device models spanning text, vision, and audio, announced at CES alongside AMD's Lisa Su. The models hit 239 tokens/sec on AMD CPU and 100 tokens/sec on iPhone 16 Pro Max, and include a revolutionary end-to-end audio model that skips the traditional ASR-LLM-TTS pipeline entirely, running in as little as 8GB of RAM.

MiroMind AI
New ModelsOpen weights

MiroThinker 1.5

MiroThinker 1.5: 30B search agent beats trillion-param models

MiroMind AI released MiroThinker 1.5, a 30B parameter open source search agent that achieves 56.1% on BrowseComp and 66.8% on BrowseComp Chinese, outperforming trillion-parameter models. It introduces 'interactive scaling' as a third scaling dimension beyond parameters and context, and is a fine-tune of Qwen 3 Thinking with 147K open training samples.

Nous Research
New ModelsOpen weights

NousCoder 14B

NousCoder 14B: 7% LiveCodeBench jump in 4 days of RL training

Nous Research released NousCoder 14B, an open source competitive programming model that achieved a 7% jump on LiveCodeBench accuracy in just four days of RL training on 48 NVIDIA B200 GPUs. Training used 24,000 verifiable problems, and the release ships under a full Apache 2 license with training code and a benchmark harness.

NVIDIA
New ModelsOpen weights

Alpha Mayo

NVIDIA Alpha Mayo: open source reasoning self-driving models

NVIDIA announced Alpha Mayo at CES, a family of open source reasoning-based self-driving AI models. The models perform end-to-end autonomous driving with explicit reasoning steps, like identifying jaywalkers and stopping accordingly, demoed in a Mercedes-Benz.

NVIDIA
New ModelsOpen weights

Nemotron Speech ASR

Nemotron Speech ASR: 600M streaming model with 24ms latency

NVIDIA released Nemotron Speech ASR, a 600M parameter open source streaming speech recognition model with 24ms median latency and support for 900 concurrent streams on a single H100. Kwindla Hultman Kramer of Daily.co demoed sub-500ms voice-to-voice latency using a three-model pipeline of Nemotron ASR, Nemotron Nano LLM, and Magpie TTS.

24ms Nemotron Speech latency
Upstage
New ModelsOpen weights

Solar Open 100B

Upstage Solar Open 100B: 102B MoE trained on 19.7T tokens

Upstage released Solar Open 100B, a 102B parameter MoE model with only 12B active parameters per token (129 experts, top-8 activation), trained on 19.7 trillion tokens including 4.5T synthetic via a 'data factory' approach. It outperforms GLM 4.5 Air on many benchmarks, features the SNAP PO reinforcement learning technique with a 50% training speedup, and delivers best-in-class Korean language performance.

102B Solar Open params

December 2025

Alibaba (Qwen)
New ModelsOpen weights

Qwen 3 Coder

Qwen 3 Coder posts insane scores in the race for the coding crown

Alibaba's Qwen 3 Coder landed in July with what the crew called insane benchmark scores for an open-weights coding model. Together with Kimi K2 and GLM 4.5 it made July the peak month for Chinese open source.

Alibaba (Qwen)
New ModelsOpen weights

Qwen speech-to-speech model

Qwen launches speech-to-speech model with emotion handling

Qwen released a speech-to-speech model in March with internal emotion handling, joining the wave of voice-native models. It was part of the Qwen team's relentless 2025 release cadence across modalities.

DeepSeek
New ModelsOpen weights

DeepSeek R1

DeepSeek R1: the open reasoning model that crashed NVIDIA's stock

DeepSeek's open-weights reasoning model dropped January 23rd and matched OpenAI's o1 at roughly 50x cheaper pricing, with an alleged training cost of just $5.5M. It crashed NVIDIA stock 17% — a $560B single-day loss, the largest single-company monetary loss in history — and made Chinese AI a household topic. The crew named it the earthquake that shattered assumptions about who leads AI.

$560B NVIDIA stock loss$5.5M DeepSeek R1 training cost
DeepSeek
New ModelsOpen weights

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus lands amid September's relentless pace

DeepSeek resurfaced in September with V3.1 Terminus, another strong open-weights release that arrived just as the crew was barely keeping up with the weekly firehose. Nisten noted that missing a single week in this period left you completely lost.

Hexgrad (Kokoro)
New ModelsOpen weights

Kokoro TTS

Kokoro TTS: 82M-param Apache 2 model hits #1 on TTS Arena

Kokoro, a tiny 82M parameter text-to-speech model, went viral in January after hitting #1 on TTS Arena. Released under Apache 2.0 and small enough to run in the browser, it showed that high-quality speech synthesis no longer required huge models.

MiniMax (Hailuo)
New Models

Hailuo 2.3

MiniMax drops Hailuo 2.3 in November

MiniMax released Hailuo 2.3 (referred to as 'Hailuo LLM 2.3' on the show) in November, cited as another strong release from the Chinese labs. It closed out a year in which MiniMax shipped everything from 4M-context LLMs to media models.

Moonshot AI (Kimi)
New ModelsOpen weights

Kimi K2

Kimi K2: the Chinese open model that earned mainstream respect

Moonshot AI's Kimi K2 dropped in July and earned serious mainstream recognition, marking peak Chinese-lab dominance of open source. It was named in the show's TL;DR as one of the defining open-weights releases of 2025.

Tencent (Hunyuan)
New ModelsOpen weights

Hunyuan open weights

Tencent enters the open weights race

In July, Tencent's Hunyuan team (rendered as 'HO One' in the episode) joined Huawei in entering the open-weights model race. It widened the field of Chinese labs shipping serious open models beyond DeepSeek, Qwen, and Moonshot.

Zhipu AI (GLM)
New ModelsOpen weights

GLM 4.5

GLM 4.5 runs on Cerebras fast enough to win hackathons

Zhipu's GLM 4.5 came out in July and was the first open model that ran on Cerebras hardware fast enough that hackathon competitors were winning with it. It set up GLM's quiet rise as a business workhorse later in the year.

Zhipu AI (GLM)
New ModelsOpen weights

GLM 4.6

GLM 4.6 quietly becomes the model businesses actually use

Zhipu's GLM 4.6 arrived in October and, per Nisten, quietly became a go-to model that many businesses still run today. It continued GLM's trajectory from hackathon favorite to production workhorse.

Google DeepMind
New ModelsOpen weights

FunctionGemma

FunctionGemma: Google's 270M function-calling model for edge agents

Google released FunctionGemma, a tiny 270M-parameter open model specialized for function calling on-device. With a roughly 500MB RAM footprint and strong gains after fine-tuning for mobile actions, it points toward privacy-first local agents on constrained hardware.

NVIDIA
New ModelsOpen weights

Nemotron 3 Nano

NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes

NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.

30B (3B active) Nemotron 3 Nano parameters
Resemble AI
New ModelsOpen weights

Chatterbox Turbo

Resemble AI open-sources Chatterbox Turbo, a 350M MIT-licensed TTS

Resemble AI released Chatterbox Turbo, an MIT-licensed 350M-parameter open text-to-speech model. The company claims it beats ElevenLabs in blind listening tests, pushing high-quality TTS into fully open, accessible territory.

Arcee AI
New ModelsOpen weights

Arcee Trinity

Arcee Trinity launches US-trained open MoE family

Arcee AI introduced Trinity, a family of US-trained open mixture-of-experts models built from scratch, starting with Trinity-Mini and Trinity-Nano-Preview. CTO Lukas Atkins joined the show to discuss the training approach and previewed Trinity-Large for January 2026. The release positions Arcee as a domestic alternative in an open-weights field dominated by Chinese labs.

DeepSeek
New ModelsOpen weights

DeepSeek V3.2 / V3.2-Speciale

DeepSeek V3.2 and V3.2-Speciale post gold-medal reasoning under MIT license

DeepSeek released V3.2 and the reasoning-first V3.2-Speciale, a 685B-parameter MoE under MIT license. Speciale posted gold-medal-level olympiad results and 96% on AIME (versus GPT-5 High at 94%), with V3.2 hitting 73.1% on SWE-Bench Verified. Aggressive pricing around 28 cents per 1M tokens on OpenRouter pushes open models closer to top closed-model capability.

96% AIME73.1% SWE-Bench Verified685B Total parameters (MoE)
Microsoft
New ModelsOpen weights

VibeVoice-Realtime-0.5B

Microsoft shares VibeVoice-Realtime-0.5B with ~300ms latency TTS

Microsoft published VibeVoice-Realtime-0.5B on Hugging Face, a small realtime text-to-speech model claiming roughly 300ms latency. The show framed it as more evidence that sub-second audio response is becoming table stakes for production voice agents.

~300ms Claimed TTS latency0.5B Parameters
Mistral AI
New ModelsOpen weights

Mistral 3 (Large 3 + Ministral 3)

Mistral returns to Apache 2.0 with Mistral Large 3 and Ministral 3

Mistral relaunched its model family under permissive Apache 2.0 licensing with Mistral Large 3 and the small Ministral 3 edge models. Large 3 ships a 256K context window and strong open-model coding positioning. The licensing shift reignited discussion around open model portability and deployability.

256K Mistral Large 3 context window

November 2025

Alibaba (Tongyi)
New ModelsOpen weights

Z-Image Turbo

Tongyi's Z-Image Turbo brings sub-second open image generation

Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.

6B Parameters
Black Forest Labs
New ModelsOpen weights

FLUX.2

Black Forest Labs releases FLUX.2, a 32B multi-reference image model

Black Forest Labs released FLUX.2, a 32B-parameter image model with open weights (FLUX.2-dev) that supports multi-reference image editing. It lets users combine multiple reference images and prompt edits with variables, a step up in controllable image editing.

32B Parameters
Microsoft
New ModelsOpen weights

Fara-7B

Microsoft ships Fara-7B, a 7B on-device computer use agent

Microsoft Research released Fara-7B, a best-in-class 7B-parameter vision-language model for computer use that runs on-device. It scores 73.5% on WebVoyager, beating OpenAI's computer-use preview while being small enough to run locally.

73.5% WebVoyager
Prime Intellect
New ModelsOpen weights

INTELLECT-3

Prime Intellect releases INTELLECT-3, a 106B open MoE model

Prime Intellect released INTELLECT-3, a 106B-parameter mixture-of-experts model with 12B active parameters that scores 90% on AIME 2024/2025. The lab fully open-sourced the training stack alongside the weights, showing a small lab can train frontier-scale models.

106B Total parameters (12B active)90% AIME 2024/2025
Tencent (Hunyuan)
New ModelsOpen weights

HunyuanOCR

Tencent's 1B HunyuanOCR beats 72B models on OCRBench

Tencent released HunyuanOCR, a 1B-parameter OCR model that scores 860 on OCRBench, beating models as large as Qwen3-VL-72B. It is a striking example of task-specialized small models outperforming generalist giants.

1B Parameters860 OCRBench score
Tencent (Hunyuan)
New ModelsOpen weights

HunyuanVideo 1.5

Tencent releases HunyuanVideo 1.5, a lightweight open video model

Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.

New ModelsOpen weights

OLMo 3

OLMo 3: Allen AI's fully open 32B model with complete recipe

Allen AI released OLMo 3, a fully open 32B dense model where the dataset, training recipe, and hyperparameters are all public — not just the weights. LDJ contrasted it with open-weights-only releases from Qwen and DeepSeek, which have never published a fully open recipe.

32B Dense parameters, fully open dataset and recipe
Meta AI
New ModelsOpen weights

SAM 3

Meta SAM 3: open-vocabulary segmentation and tracking in video

Meta's Segment Anything Model 3 adds open-vocabulary segmentation with text and exemplar prompts, letting you click or type to segment and track any object across images and video. The panel demoed it live on golden retriever videos, and it ships openly as part of Meta's open-source push.

Meta AI
New ModelsOpen weights

SAM 3D

SAM 3D turns single photos into 3D objects and human bodies

Released alongside SAM 3, SAM 3D reconstructs 3D objects and full human bodies from a single image with surprisingly high quality. It extends the Segment Anything family from 2D segmentation into single-image 3D reconstruction.

Baidu
New ModelsOpen weights

ERNIE-4.5-VL-28B-A3B-Thinking

Baidu open-sources ERNIE-4.5-VL-28B-A3B-Thinking visual reasoning model

Baidu released ERNIE-4.5-VL-28B-A3B-Thinking, an Apache 2.0 open-weights visual reasoning MoE with only 3B active parameters that claims to rival much larger models like GPT-5 High on vision tasks. It features image zooming, spatial grounding, and reasoning, with strong small-model performance attributed to GSPO training from the Qwen team.

3B Active Parameters
H Company
New ModelsOpen weights

Holo2

H Company open-sources Holo2 multimodal computer-use agent family

Dropped live during the show: H Company open-sourced Holo2, a next-generation multimodal agent family fine-tuned on Qwen3-VL for grounding, navigation, and reasoning across web, desktop, and mobile. It posts SOTA results on computer-use and web-navigation benchmarks like OSWorld-G and ships in 4B, 8B, and 30B variants under Apache 2.0.

Meta AI
New ModelsOpen weights

Omnilingual ASR

Meta releases Omnilingual ASR covering 1,600+ languages

Meta released Omnilingual ASR, an Apache 2.0 speech recognition family supporting over 1,600 languages, including 500+ never before served by any ASR system, with character error rate under 10% for 78 languages. The release includes an open corpus of 500k+ rows of transcribed audio, and the 1B model was praised as a near drop-in state-of-the-art replacement on Hugging Face.

1600+ Languages Supported
WeiboAI
New ModelsOpen weights

VibeThinker-1.5B

WeiboAI releases VibeThinker-1.5B open reasoning model

Weibo's AI team open-sourced VibeThinker-1.5B, a tiny reasoning model that reportedly outperforms much larger models like DeepSeek R1 on select reasoning benchmarks. Part of a week where small open-weights models from Chinese labs kept punching above their weight.

New ModelsOpen weights

OlmoEarth

Ai2 launches OlmoEarth foundation models and open Earth-intelligence platform

Ai2 launched OlmoEarth, a family of foundation models plus an open, end-to-end platform for fast, high-resolution Earth intelligence. It applies the lab's open-model approach to geospatial and remote-sensing data, making Earth observation workloads accessible without proprietary stacks.

Hugging Face
Also ReleasedOpen weights

Smol Training Playbook

Hugging Face publishes the Smol Training Playbook for LLM pretraining

Hugging Face published the Smol Training Playbook, a 200+ page end-to-end guide to reliably pretraining and operating LLMs. It distills the team's practical experience from the SmolLM line into an open resource for anyone training their own models.

Maya Research
New ModelsOpen weights

Maya-1

Maya-1 open-source voice generation model released

Maya-1 is a new open-source voice generation model that was demoed on the show as part of the week's voice AI wave. The panel highlighted how quickly open voice model quality is improving, with expressive output that holds up against commercial systems.

Meituan (LongCat)
New ModelsOpen weights

LongCat Flash Omni

Meituan releases LongCat Flash Omni, a 560B (27B active) omni model

Meituan's LongCat team released LongCat Flash Omni, a 560B-parameter mixture-of-experts model with roughly 27B active parameters that accepts text, audio, and video input. It extends the open LongCat Flash line into omni-modal territory from a lab better known for food delivery than frontier models.

Moonshot AI
New ModelsOpen weights

Kimi K2 Thinking

Moonshot AI releases Kimi K2 Thinking, an open 1T-param reasoning MoE

Moonshot AI released Kimi K2 Thinking, an open-source 1-trillion-parameter mixture-of-experts reasoning agent with 256K context and large-scale tool-calling capacity. The panel treated it as the open-source centerpiece of the week, focusing on its reasoning quality and coding utility rather than just benchmark screenshots, and as a sign open models keep closing the usability gap with frontier closed models.

October 2025

New ModelsOpen weights

Ming-flash-omni Preview

Ming-flash-omni Preview: sparse MoE omni-modal open model

Ant Group's InclusionAI team released Ming-flash-omni Preview, a sparse mixture-of-experts omni-modal model on Hugging Face. It handles multiple input and output modalities in a single open-weights model, adding to the wave of Chinese open omni-modal releases.

MiniMax
New ModelsOpen weights

MiniMax M2

MiniMax M2: open-source agentic model at 8% of Claude's price, 2x speed

MiniMax released M2, an open-source agentic model positioned at roughly 8% of Claude's price while running about twice as fast. Head of Engineering Skyler Miao joined the show for a deep dive, framing M2 as both a model story and a speed story, and the panel read it as part of a broader open-model pressure wave on frontier labs.

8% of Claude's price2x speed vs comparable frontier models
Moonshot AI (Kimi)
New ModelsOpen weights

Kimi Linear

Kimi Linear: 48B open model with linear attention and 1M context

Moonshot AI released Kimi Linear, a 48B parameter (A3B active) instruct model that uses linear attention to reach a 1M token context window. It is an open-weights bet on efficient long-context architectures from the Kimi team.

48B parameters (3B active)1M token context window
OpenAI
New ModelsOpen weights

GPT-OSS-Safeguard

OpenAI ships GPT-OSS-Safeguard, first open-weight safety reasoning models

OpenAI released GPT-OSS-Safeguard, its first open-weight safety reasoning models, built on the GPT-OSS family. The models let developers apply custom safety policies via reasoning rather than fixed classifiers, extending OpenAI's open-weights push into the trust-and-safety layer.

Alibaba (Qwen)
New ModelsOpen weights

Qwen3-VL 2B & 32B

Qwen3-VL adds compact 2B and 32B multimodal models

Alibaba's Qwen team extended the Qwen3-VL family with newly updated 2B and 32B checkpoints. The 2B is a generic VLM (OCR-capable) that holds up against its 4B and 8B siblings from prior weeks, while the 32B reportedly outperforms GPT-5 mini and Claude 4 Sonnet on benchmarks.

New ModelsOpen weights

olmOCR 2 7B

Ai2 releases olmOCR 2 7B open OCR model

The Allen Institute for AI updated its open OCR line with olmOCR 2 at 7B (released as an FP8 checkpoint), landing in the same week as DeepSeek-OCR, Qwen3-VL, and Liquid's LFM2-VL. Another sign that document understanding became this week's hottest open-model category.

DeepSeek
New ModelsOpen weights

DeepSeek-OCR

DeepSeek-OCR turns text into compressed vision tokens for massive contexts

DeepSeek open-sourced DeepSeek-OCR, a 3B model (~570M active parameters) that is less an OCR model and more a context-compression breakthrough: it renders text as images, compresses it up to 10x while retaining 97% decoding accuracy (60% even at 20x), and reads it back with a tiny vision decoder. The approach suggests text tokenization is far from optimal and points at vastly cheaper long-context processing; alphaXiv reportedly OCR'd all of arXiv for $1000 versus $7500 with MistralOCR, and a single H100 can process up to 200K pages.

97% decoding accuracy at 10x compression~570M active parameters (3B total)200K pages scannable on a single H100
Krea AI
New ModelsOpen weights

Krea Realtime Video

Krea open-sources a 14B real-time video generation model

Krea AI open-sourced a 14-billion-parameter real-time video model, with weights on Hugging Face. It joins the week's clear trend of generative video racing toward live, interactive experiences rather than offline rendering.

14B parameters
Lightricks
New ModelsOpen weights

LTX-2

LTX-2: native 4K audio+video generation engine from Lightricks

Lightricks announced LTX-2 as breaking news on the show: a video generation engine producing native 4K video (no upscaling) with synchronized audio, positioned as a fast, efficient open alternative to closed models like Sora. It is billed as open-source with weights coming this fall.

4K native generation resolution, no upscaling
Liquid AI
New ModelsOpen weights

LFM2-VL-3B

Liquid AI ships LFM2-VL-3B tiny multilingual vision-language model

Liquid AI released LFM2-VL-3B, a tiny multilingual vision-language model, part of a wave of OCR-and-VLM releases this week. It targets efficient on-device and edge vision-language workloads at the 3B scale.

Google DeepMind
New ModelsOpen weights

C2S-Scale 27B

Google's C2S-Scale 27B validates a cancer hypothesis in living cells

Google released C2S-Scale 27B, a Gemma-based single-cell biology model that generated a novel cancer therapy hypothesis later validated in living cells. The show called this a bombshell example of AI contributing to real scientific discovery rather than just benchmarks.

KAIST
New ModelsOpen weights

KORMo 10B

KAIST releases KORMo, a bilingual Korean/English 10B open model

KAIST published KORMo, a 10B parameter fully open bilingual model for Korean and English, with weights on Hugging Face and an accompanying paper. It continues the trend of strong national-language open models coming out of Korean labs.

September 2025

Alibaba (Qwen)
New ModelsOpen weights

Qwen3-Omni

Qwen3-Omni ships open-weights any-to-any audio, vision, and text

Alongside Qwen3-VL, Alibaba released Qwen3-Omni, an end-to-end omni-modal open-weights model that takes text, image, audio, and video input and can respond with streaming speech. The show treated it as direct evidence of how fast open multimodal systems are improving, with weights on Hugging Face, a GitHub repo, demos, and availability in Qwen Chat and the Model Studio API.

Alibaba (Qwen)
New ModelsOpen weights

Qwen3-VL

Alibaba releases Qwen3-VL open-weights vision-language flagship

Alibaba's Qwen team shipped Qwen3-VL, its new flagship open-weights vision-language family, headlining the episode's 'Qwen-mas' barrage. The panel discussed it as a practical workflow tool for visual understanding and agentic GUI tasks, not just another model card, with weights, a blog post, and a Hugging Face demo all available at launch.

Alibaba (Wan)
New ModelsOpen weights

Wan 2.2 Animate

Wan Animate brings open-weights character animation and replacement

Alibaba's Wan team released Wan 2.2 Animate, an open-weights model that animates a character image from a performance video, replicating motion and expressions, or swaps a character into existing footage. It landed in the episode's closing run of video releases showing multimodal product quality climbing across the board.

DeepSeek
New ModelsOpen weights

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus refines agents and bilingual output

DeepSeek released V3.1 Terminus, an update to V3.1 with cleaner bilingual output, stronger agentic tool use, and cheaper long-context handling. The open weights are available on Hugging Face, continuing DeepSeek's cadence of iterative open releases.

IBM
New ModelsOpen weights

Granite Docling 258M

IBM releases Granite Docling 258M compact document-parsing VLM

IBM published Granite Docling 258M, an ultra-compact open-source vision-language model for document understanding that converts documents into structured output. At just 258M parameters it reinforced the show's point that tiny specialized models are becoming genuinely useful workflow tools.

Liquid AI
New ModelsOpen weights

Liquid Nanos

Liquid AI ships Liquid Nanos, tiny task-specific on-device models

Liquid AI released Liquid Nanos, a family of very small task-specific models built for jobs like extraction, translation, RAG, and tool calling that can run on-device. The collection landed on Hugging Face, fitting the episode's theme of small-but-capable models powering real products.

Meta AI
New ModelsOpen weights

Code World Model (CWM)

Meta releases 32B Code World Model for agentic code reasoning

Meta released CWM, a 32B open-weights research model trained to internally model code execution, aimed at agentic code reasoning rather than plain code completion. The weights are on Hugging Face under facebook/cwm, giving the open-source community a new approach to code world modeling.

Moondream AI
New ModelsOpen weights

Moondream 3

Moondream 3 preview punches above its weight in the tiny-VLM race

Moondream released a preview of Moondream 3, a small open vision-language model that punches well above its size class. CTO and co-founder Vik Korrapati joined the show to explain why small, capable vision models matter for real product building, framing Moondream 3 as a practical tool rather than a benchmark flex.

Alibaba (Tongyi Lab)
New ModelsOpen weights

Tongyi DeepResearch 30B-A3B

Tongyi DeepResearch: open-source A3B web agent rivals OpenAI Deep Research

Alibaba's Tongyi Lab open-sourced Tongyi DeepResearch, a 30B mixture-of-experts web research agent with only 3B active parameters. The lab claims parity with OpenAI's Deep Research on agentic search and report-writing tasks, and the weights are available on Hugging Face.

ByteDance / Tsinghua
New ModelsOpen weights

HuMo

HuMo: human-centric multimodal video generation from ByteDance/Tsinghua

ByteDance research and Tsinghua released HuMo, a human-centric video generation model that conditions on multimodal inputs (text, image, and audio) to produce videos of people. The weights are available on Hugging Face.

Mistral AI
New ModelsOpen weights

Magistral-Small-2509

Mistral updates its open reasoning model with Magistral-Small-2509

Mistral published Magistral-Small-2509, an updated checkpoint of its small open-weights reasoning model. The refresh keeps Mistral's open reasoning line current as the open-model competitive baseline moves quickly.

Moondream
New ModelsOpen weights

Moondream 3 (Preview)

Moondream 3 Preview: 9B MoE VLM with 2B active parameters

Moondream released a preview of Moondream 3, a 9B mixture-of-experts vision-language model with only 2B active parameters. It targets frontier-level visual reasoning at small-model cost, continuing Moondream's run of efficient open vision models.

Perceptron AI
New ModelsOpen weights

Isaac 0.1

Perceptron AI introduces Isaac 0.1, a 2B perceptive-language model

Perceptron AI released Isaac 0.1, a 2B parameter perceptive-language model with open weights on Hugging Face. Despite its small size, the show notes highlight that it 'points better than GPT', excelling at visual grounding and pointing tasks relative to much larger models.

Alibaba (Tongyi Lab)
New ModelsOpen weights

WebWatcher-32B

Alibaba's Tongyi Lab open-sources WebWatcher vision-language research agent

Alibaba's Tongyi Lab open-sourced WebWatcher, a vision-language deep research agent that sets new state-of-the-art results on agentic browsing and research tasks. The 32B model combines visual understanding with web research capabilities and is available on Hugging Face.

Apple
New ModelsOpen weights

FastVLM-7B

Apple's FastVLM-7B lands with a speed-first vision encoder, 85x faster TTFT

Apple released FastVLM-7B, a vision-language model built around a speed-first vision encoder that delivers up to 85x faster time-to-first-token than peer VLMs. Quantized variants (7B-int4, 1.5B-int8) on Hugging Face make it practical for on-device and real-time vision use, anchoring the show's fast-VLM discussion.

Google DeepMind
New ModelsOpen weights

EmbeddingGemma

Google releases EmbeddingGemma, a 300M-param SOTA embedding model for RAG

Google released EmbeddingGemma, a 300M-parameter open embedding model that achieves state-of-the-art results for its size, aimed at RAG and on-device semantic search. It dropped as breaking news during the show, with browser-based demos like Semantic Galaxy showing it running fully client-side.

Nous Research
New ModelsOpen weights

Hermes 4 14B

Nous Research releases Hermes 4 14B compact hybrid reasoning model

Nous Research launched Hermes 4 at 14B, a compact hybrid reasoning model with tool calling designed for both local and cloud use. It extends the Hermes 4 family down to a size practical for local deployment while keeping reasoning and tool-use capabilities, with a full tech report published on arXiv.

Swiss AI Initiative
New ModelsOpen weights

Apertus-8B / Apertus-70B

Switzerland launches Apertus-8B and 70B, fully open multilingual LLMs

The Swiss AI Initiative launched Apertus-8B and Apertus-70B, fully open multilingual LLMs trained on 15T tokens covering more than 1,800 languages. The release stands out for full openness (weights, data recipe, and training transparency) and unusually broad language coverage from a national effort.

Tencent
New ModelsOpen weights

Hunyuan-MT-7B

Tencent open-sources Hunyuan-MT-7B translation model after sweeping WMT2025

Tencent open-sourced Hunyuan-MT-7B, a 7B-parameter machine translation model, after it swept the WMT2025 translation competition. It gives the open-weights community a small, focused translation model that punches well above its size class.

July 2025

Agentica
New ModelsOpen weights

DeepSWE-Preview

DeepSWE-Preview hits 59% SWE-Bench Verified with pure RL on Qwen3-32B

Agentica and collaborators (with guest Michael Luo of UC Berkeley) released DeepSWE-Preview, a fully open-sourced RL-trained coding agent built on Qwen3-32B that reached 59% on SWE-Bench Verified, a top open result in a benchmark dominated by closed systems. The team published training methodology and weights, emphasizing reproducible reward design and verification over sealed benchmark numbers.

59% SWE-Bench Verified
Baidu
New ModelsOpen weights

ERNIE 4.5

Baidu open-sources ERNIE 4.5, a 10-model multimodal family

Baidu open-sourced the ERNIE 4.5 series, a family of 10 models ranging from 424B down to 0.3B parameters with multimodal capabilities, reportedly beating o1 on DocVQA. The release marks a sharp reversal from Baidu's previous anti-open-source posture and another sign that Chinese labs are setting the pace in open source.

10 ERNIE 4.5 models
Huawei
New ModelsOpen weights

Pangu Pro MoE

Huawei's Pangu Pro MoE: 72B model trained entirely on Ascend NPUs

Huawei released Pangu Pro, a 72B-parameter MoE trained on its own Ascend NPUs rather than Nvidia or AMD hardware, hitting 1,528 tokens/sec and pretrained on 13T tokens. The panel framed it as the geopolitical open-model story of the week, showing how far Chinese compute stacks have advanced under sanctions.

Tencent
New ModelsOpen weights

Hunyuan-A13B-Instruct

Tencent ships Hunyuan-A13B: 80B MoE with only 13B active params

Tencent released Hunyuan-A13B-Instruct, an 80B-parameter MoE that activates only 13B parameters at inference while keeping a 256K context window. Built by the team with WizardLM lineage, it posts strong reasoning benchmarks and feels unusually practical for its class, though the panel flagged its license limits.

13B Hunyuan active params

May 2025

DeepSeek
New ModelsOpen weights

DeepSeek-R1-0528

DeepSeek drops R1-0528, an updated open reasoning model with big gains

DeepSeek released R1-0528 out of nowhere, an update to their open-weights reasoning model with serious performance jumps: AIME 91, LiveCodeBench 73, and SWE-bench Verified 57.6. They also shipped an 8B distilled version based on Qwen3 that can run on a laptop, keeping it among the best open-weight models available.

91 AIME score, beating previous R1 by a mile8B Distilled Qwen3-based version runnable on a laptop
A-M Team
New ModelsOpen weights

AM-Thinking v1

AM-Thinking v1: 32B dense reasoning model beats bigger MoEs at math and code

A 32B dense open-weights reasoning LLM from a new Chinese team that takes on much larger mixture-of-experts models and comes out on top for math and code, hitting 85.3% on AIME 2024, 70.3% on LiveCodeBench v5, and 92.5% on Arena-Hard. It supports a /think reasoning toggle, ships with a permissive license, is tooled for vLLM, LM Studio, and Ollama, and runs at 25 tokens/sec on a single 80GB GPU with INT4 quantization. A multilingual RLHF pass and 128k context window are in the works.

32B dense parameters85.3% AIME 202425 tokens/sec on a single 80GB GPU with INT4
Alibaba
New ModelsOpen weights

Wan 2.1

Alibaba's Wan 2.1: open-source diffusion-transformer text-to-video suite

Alibaba, the team behind the Qwen LLMs, released Wan 2.1, a full stack of open-source diffusion-transformer text-to-video foundation models. Amid the show's discussion of video-model fatigue, this was called out as a release that cuts through the noise, with weights on Hugging Face and code on GitHub.

Nous Research
Products & AppsOpen weights

Psyche

Nous Research launches Psyche, a decentralized cooperative-training network

Psyche is Nous Research's decentralized cooperative-training network that lets distributed participants jointly train large models over the internet. The launch includes open code on GitHub and a live dashboard tracking the first run, a 40B model called Consilience. COO Dillon Rolnick joined the show to explain the decentralized training push.

Stability AI
New ModelsOpen weights

Stable Audio Open Small

Stability AI and Arm release Stable Audio Open Small for on-device audio

Stability AI, together with Arm, released Stable Audio Open Small, a 341M-parameter open text-to-audio model built for real-world on-device deployment. The show framed it as part of a small comeback for Stability, with weights on Hugging Face and an accompanying paper.

StepFun
New ModelsOpen weights

Step1X-3D

StepFun's Step1X-3D: open two-stage framework for textured 3D assets

StepFun released Step1X-3D, an open two-stage framework for high-fidelity, controllable generation of textured 3D assets: it first synthesizes watertight geometry, then generates view-consistent textures. Trained on 2M curated meshes, the release also includes a curated dataset of 800K assets and a Hugging Face demo.

New ModelsOpen weights

Falcon-Edge

Falcon-Edge: ternary BitNet LLMs for edge deployment under 1GB VRAM

TII's Falcon-Edge project releases ternary BitNet LLMs (1B and 3B base models) that slash memory and compute requirements, enabling inference on less than 1GB of VRAM. Fine-tuners get pre-quantized checkpoints and a clear path to 1-bit LLMs.

Alibaba (Qwen)
New ModelsOpen weights

Qwen 3

Alibaba open-weights the full Qwen 3 family under Apache 2.0

Alibaba released the entire Qwen 3 stack: two MoE models (235B total/22B active and 30B/3B active) plus six dense siblings from 32B down to 0.6B, all Apache 2.0 with day-one support in LM Studio, Ollama, vLLM, MLX and llama.cpp. The headline feature is a runtime hybrid 'thinking' toggle (/think and /no_think) that trades latency for reasoning depth. Trained on ~36T tokens with 128K context and 119-language coverage, the 235B MoE rivals DeepSeek-R1, o1, o3-mini and Gemini 2.5 Pro on coding and math.

235 B Flagship MoE total parameters (22B active)30 B Qwen3-30B-A3B hit 57 tok/s on a Mac with speculative decoding36 Trillions of pre-training tokens (2x Qwen 2.5)
HiDream
New ModelsOpen weights

HiDream E1

HiDream E1: open-weights image model with standout Ghibli style

HiDream released E1, an open-weights image editing/generation model (Apache 2.0-style licensing) noted for beautiful Ghibli-style outputs. It ranks #4 on the Artificial Analysis image arena leaderboard, sitting among top contenders like Google Imagen and ReCraft.

Kyutai
New ModelsOpen weights

Helium-1

Kyutai releases Helium-1, a 2B European-language model plus dactory pipeline

Kyutai released Helium-1, a 2B-parameter model distilled from Gemma-2-9B and purpose-built for Europe's 24 official languages, under CC-BY 4.0. It sets a new state of the art for its size class on MMLU-EU, ARC-EU and FLORES translation while fitting in under 2GB VRAM for edge and phone deployment. They also open-sourced 'dactory' (MIT), their full Common Crawl data-processing pipeline that scores, dedups and tags webpages.

Microsoft
New ModelsOpen weights

Phi-4-reasoning

Microsoft ships Phi-4-reasoning and Phi-4-reasoning-plus (14B, MIT)

Microsoft fine-tuned the 14B Phi-4 on 1.4M curated chain-of-thought traces (SFT) and added a small RL stage (Plus variant) to create two MIT-licensed reasoning models. They punch far above their weight: Phi-4-reasoning-plus outperforms DeepSeek-R1-Distill-70B on AIME 25 (78% vs 51%) and sits within a few points of the full 671B DeepSeek-R1, while running on a single GPU with explicit <think> scaffolding.

OpenPipe
New ModelsOpen weights

ART·E

OpenPipe's ART·E: RL-trained open email agent that beats o3

OpenPipe released ART·E, an Apache 2.0 email research agent built on a 14B Qwen 2.5 backbone, trained on 500K Enron emails plus synthetic Q&A and refined with reinforcement learning. It tops o3 on accuracy (96% vs 90%) while running 5x faster (1.1s median) and 64x cheaper ($0.85 per 1,000 queries), using a simple three-tool loop.

Xiaomi
New ModelsOpen weights

MiMo-7B

Xiaomi enters open weights with MiMo-7B, MIT-licensed reasoning family

Xiaomi's first open-weights release is a 7B dense family (Base, SFT, RL, RL-Zero) trained from scratch on 25T tokens with a multi-token-prediction objective and rule-verifiable reinforcement learning. The RL variant matches OpenAI o1-mini on benchmark suites despite being far smaller, scoring 55.4% on AIME 2025 and 49.3% on LiveCodeBench v6, all under an MIT license with vLLM-ready weights.

April 2025

Daily (Pipecat)
New ModelsOpen weights

Smart-Turn VAD

Pipecat releases Smart-Turn, an open source semantic VAD model

The Pipecat team (from Daily) released Smart-Turn, an open source semantic voice activity detection model that understands when a speaker has actually finished their turn rather than just detecting silence. Kwindla Kramer joined the show to break down how semantic VAD makes voice agent conversations feel far more natural, with a community training effort at turn-training.pipecat.ai.

Google DeepMind
New ModelsOpen weights

Gemma 3 QAT

Google ships Quantization-Aware Trained Gemma 3 models for consumer GPUs

Google released Quantization-Aware Training (QAT) versions of the Gemma 3 family, dramatically cutting memory requirements while preserving quality. The 27B model drops from a hefty 54GB to just 14.1GB, and even the 1B model goes from 2GB to about half a gig, making state-of-the-art open models runnable on consumer GPUs. Wolfram took the 4B QAT model for a spin in LM Studio on the show.

27B Gemma 3 27B QAT: 54GB down to 14.1GB1B Gemma 3 1B QAT: 2GB down to ~0.5GB4B 4B QAT model tested in LM Studio
HumanLayer
Dev ToolsOpen weights

12-Factor Agents

Dex Horthy publishes 12-Factor Agents, a guide to production-ready agents

HumanLayer founder Dex Horthy published 12-Factor Agents, an open GitHub repo and essay distilling common patterns and pitfalls for building reliable, production-ready AI agents. Drawing on his experience building agent SDKs, it argues that serious teams end up writing large parts from scratch and lays out principles for robust agent design, discussed in depth on the show.

Lvmin Zhang (lllyasviel)
New ModelsOpen weights

FramePack

FramePack generates 120-second videos on just 6GB of VRAM

FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.

120s Max video length6GB Minimum VRAM
Nari Labs
New ModelsOpen weights

Dia-1.6B

Nari Labs' Dia: a wild 1.6B open source TTS model that blew up Twitter

Nari Labs released Dia, a 1.6B parameter open-weights text-to-speech model that absolutely blew up Twitter with its expressive, emotional dialogue generation, including laughs, coughs, and multi-speaker conversations. Built by a tiny team, it punches far above its weight against commercial TTS systems and supports voice cloning, with demos available on Fal.ai.

1.6B Parameters
NVIDIA
New ModelsOpen weights

Describe Anything (DAM-3B)

NVIDIA releases DAM-3B for region-based image and video captioning

NVIDIA dropped the Describe Anything Model (DAM-3B), a 3 billion parameter multimodal model for region-based image and video captioning. You can point it at a specific region of an image or video and it generates a detailed description of just that area. NVIDIA also published an accompanying DescribeAnything dataset and a Hugging Face demo.

3B Parameters
Sand AI
New ModelsOpen weights

MAGI-1

Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model

Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.

24B Parameters24 Frames per autoregressive chunk
Microsoft
New ModelsOpen weights

BitNet b1.58

Microsoft releases BitNet 1.58-bit model weights on Hugging Face

Microsoft published BitNet (listed in the show notes as BitNet v1.5), its native 1.58-bit quantized LLM, as open weights on Hugging Face. The ternary-weight approach targets extremely efficient CPU inference at a fraction of the memory of standard models.

OpenAI
Dev ToolsOpen weights

Codex CLI

OpenAI debuts Codex CLI, an open source terminal coding agent

OpenAI released Codex CLI, an open source coding tool for the terminal. It ships with hardened security, using Apple Seatbelt on macOS to limit execution to the current directory plus temp files.

Prime Intellect
New ModelsOpen weights

INTELLECT-2

Prime Intellect launches INTELLECT-2, a 32B globally-distributed RL run

Prime Intellect released INTELLECT-2, a 32B reasoning model trained with globally decentralized reinforcement learning, a follow-up to the INTELLECT-1 decentralized pretraining run covered on the show in December. The release includes open weights on Hugging Face, a tech report, and the PRIME-RL training code.

Deep Cogito
New ModelsOpen weights

Cogito v1 Preview (3B-70B)

Deep Cogito debuts Cogito v1 Preview models from 3B to 70B, beating DeepSeek 70B

New lab Deep Cogito released the Cogito v1 Preview family of open models ranging from 3B to 70B parameters, claiming SOTA results at each size and beating DeepSeek's 70B distill. The models are available on Hugging Face, giving local AI enthusiasts the small-to-mid sizes Llama 4 skipped.

3B-70B Model size range
Google
Also ReleasedOpen weights

Agent2Agent (A2A) protocol

Google announces A2A, an open agent-to-agent communication protocol

Google announced the Agent2Agent (A2A) protocol at Cloud Next, an open spec for agents from different vendors to discover and communicate with each other. The spec was published on GitHub with a long list of launch partners, including Weights & Biases.

Meta AI
New ModelsOpen weights

Llama 4 (Scout & Maverick)

Meta drops Llama 4 Scout (109B) and Maverick (400B) open-weights MoE models

Meta released the long-awaited Llama 4 family in a chaotic Saturday drop: Scout (17B active / ~109B total, 16 experts) and Maverick (17B active / ~400B total, 128 experts), with a 2T-parameter Behemoth still in training. The models are multimodal, multilingual MoE architectures trained on ~30T tokens with FP8 and interleaved attention (iRoPE), claiming 10M context for Scout and 1M for Maverick. The release was marred by drama: the LMArena version differed from the released model, and the community criticized the lack of small local-friendly sizes.

10M Stated context window for Llama 4 Scout288B Active parameters of unreleased Behemoth (2T total)17B Active parameters for both Scout and Maverick
Moonshot AI (Kimi)
New ModelsOpen weights

Kimi-VL & Kimi-VL-Thinking

Moonshot drops Kimi-VL and Kimi-VL-Thinking, tiny A3B open vision models

Moonshot AI released Kimi-VL and Kimi-VL-Thinking, compact vision-language models with only ~3B active parameters (A3B MoE). The thinking variant adds reasoning to a tiny VLM, and both are available openly on Hugging Face.

A3B ~3B active parameters (MoE)
NVIDIA
New ModelsOpen weights

Llama-3.1-Nemotron-Ultra-253B

NVIDIA ships Nemotron Ultra, a 253B pruned and distilled Llama 3.1-405B

NVIDIA released Nemotron Ultra, a pruned and distilled finetune of Llama 3.1-405B at roughly half the parameters (253B). Its benchmarks even included Llama 4 comparisons, showing the older finetuned Llama beating the new models on AIME, GPQA and more. It supports 128K context and fits on a single 8xH100 node for inference.

253B Parameters (pruned from Llama 3.1-405B)128K Context window
New ModelsOpen weights

DeepCoder-14B-Preview

DeepCoder-14B: open RL-finetuned coder beats DeepSeek R1 and o3-mini on coding

Together AI and Agentica (UC Berkeley Sky Computing Lab) released DeepCoder-14B-Preview, a reasoning model finetuned with RL that beats DeepSeek R1 and even o3-mini on several coding benchmarks. The project aims to democratize RL: the team open-sourced the model, the training dataset, the Weights & Biases logs, and the eval logs. Guest Michael Luo from Agentica joined the show to discuss the release.

14B Model parameters
All Hands AI
New ModelsOpen weights

OpenHands LM 32B

OpenHands LM 32B: MIT-licensed coding agent model hits 37.2% SWE-Bench

All Hands AI (formerly OpenDevin) released OpenHands LM 32B, an MIT-licensed Qwen finetune that scores 37.2% on SWE-Bench Verified, competing with much larger models on real-world repo tasks. The OpenHands agent also took the #2 spot on the new Live SWE-Bench leaderboard, and the 32B model runs locally on a single RTX 3090. A hosted OpenHands Cloud version is also available; guest Xingyao Wang joined the show to discuss it.

37.2% SWE-Bench Verified score#2 Live SWE-Bench leaderboard (OpenHands agent)
Nomic AI
New ModelsOpen weights

Nomic Embed Multimodal

Nomic Embed Multimodal: SOTA embeddings for visual documents

Nomic AI released Nomic Embed Multimodal, new 3B and 7B parameter embedding models built on Alibaba's Qwen2.5-VL. They achieve SOTA on visual document retrieval by embedding interleaved text-image sequences, ideal for PDFs and complex webpages. The 7B model ships under Apache 2.0 with open weights, code, and data; guest Zach Nussbaum discussed the release on the show.

3B parameters (smaller model)7B parameters (Apache 2.0 model)

March 2025

Alibaba (Qwen)
New ModelsOpen weights

Qwen2.5-Omni-7B

Qwen launches Omni 7B: sees, hears, reads, and talks back

Qwen released Qwen2.5-Omni-7B, an open-weights omni-modal model that perceives text, images, audio, and video, and generates both text and speech. It packs end-to-end multimodal perception and spoken output into a 7B parameter model available on Hugging Face.

7B parameters
DeepSeek
New ModelsOpen weights

DeepSeek-V3-0324

DeepSeek silently drops V3-0324, 685B params under MIT license

DeepSeek silently updated their V3 base model with DeepSeek-V3-0324, a 685B parameter MoE released on Hugging Face under the MIT license. This is not R1 (their reasoning model) but the powerful base model R1 was built on, and supposedly the base for a future R2.

685B parameters
Dev ToolsOpen weights

MLX-Audio v0.0.3

Prince Canuma releases MLX-Audio v0.0.3 for speech on Apple Silicon

Prince Canuma, creator of MLX-VLM, FastMLX, and MLX Embeddings, released MLX-Audio v0.0.3, an open-source library bringing speech and audio models to Apple Silicon via MLX. It makes powerful open-source TTS and audio models accessible locally on Mac hardware.

Canopy Labs
New ModelsOpen weights

Orpheus 3B

Canopy Labs drops Orpheus 3B natural-sounding speech model

Canopy Labs released Orpheus, an open speech language model that produces natural, human-sounding speech, headlined by a 3B model with smaller variants (1B, 500M, 150M) in the family. Weights are on Hugging Face with a Colab for trying it out, discussed on the show with Daily.co CEO Kwindla Kramer in the voice AI segment.

NVIDIA
New ModelsOpen weights

Canary 1B/180M Flash

NVIDIA Canary Flash: Apache 2 speech recognition and translation

NVIDIA released Canary 1B Flash and 180M Flash, Apache 2.0 licensed speech recognition and translation models built as Llama finetunes. The permissive license makes them freely usable for commercial ASR and translation workloads.

NVIDIA
New ModelsOpen weights

Llama-Nemotron (Super 49B, Nano 8B)

NVIDIA drops Llama-Nemotron reasoning models plus training dataset

NVIDIA released the Llama-Nemotron family, including Super 49B and Nano 8B reasoning models, announced around GTC. Alongside the open weights, NVIDIA published the Llama-Nemotron post-training dataset, giving the community both the models and the data recipe behind them.

Tencent
New ModelsOpen weights

Hunyuan3D 2.0 MV & Turbo

Tencent updates Hunyuan3D 2.0 with MultiView and Turbo variants

Tencent updated its Hunyuan3D 2.0 image-to-3D model with an MV (MultiView) version that conditions on multiple input views, plus a faster Turbo variant. The show highlighted it as new SOTA for 3D generation, available to try in a Hugging Face space.

New ModelsOpen weights

OLMo 2 32B

AllenAI ships OLMo 2 32B, a fully open GPT-4-class model

The Allen Institute for AI released OLMo 2 32B, its biggest fully open model yet, with weights, code, and dataset all published under Apache 2.0. Announced by Nathan Lambert as a last-second addition, it reportedly beats GPT-3.5 and GPT-4o mini as well as leading open-weight models like Qwen and Mistral at its size.

Cohere
New ModelsOpen weights

Command A

Cohere Command A: 111B enterprise model with 256K context on just 2 GPUs

Cohere announced Command A, a 111B parameter open-weights model with a 256K context window, presented on the show by Cohere's Sandra Kublik. It runs on only two GPUs where models of this size typically require around 32, and is built for enterprise use: agentic tasks, tool use, multilingual performance, and secure private deployments.

EuroBERT team
New ModelsOpen weights

EuroBERT

EuroBERT: multilingual encoder models from 210M to 2.1B parameters

EuroBERT is a new family of multilingual encoder models ranging from 210M to 2.1B parameters, trained on a 5 trillion-token dataset across 15 languages with 8K context support. It targets European and global language NLP tasks like retrieval and RAG, where properly encoding non-English character sets matters.

Google DeepMind
New ModelsOpen weights

Gemma 3

Google open sources Gemma 3, 1B-27B multimodal family with 128K context

Google released Gemma 3, an open-weights model family spanning 1B to 27B parameters with multimodal (text, image, video) capabilities, support for over 140 languages, and a 128K context window. The 27B model runs on a single GPU, with Sundar Pichai claiming competitors need roughly 10x the compute for similar performance. It shipped with day-one open source ecosystem support (Hugging Face, Ollama, Kaggle) plus ShieldGemma 2 for content moderation.

HPC-AI Tech
New ModelsOpen weights

Open-Sora 2.0

OpenSora 2.0: 11B open-source video model trained for $200K

OpenSora 2.0 is an 11B parameter open-source video generation model that claims state-of-the-art results while costing only about $200,000 to train. The team claims performance approaching OpenAI's Sora on some benchmarks, underscoring how fast open-source video generation is improving.

Nous Research
New ModelsOpen weights

DeepHermes 3 (24B / 3B)

Nous Research releases DeepHermes 24B and 3B hybrid reasoning models

Nous Research released DeepHermes hybrid reasoners at 24B (Mistral-based) and 3B sizes, models that can toggle between standard chat responses and long chain-of-thought reasoning. The 24B preview is available on Hugging Face as part of the week's wave of open-source reasoning model releases.

Reka AI
New ModelsOpen weights

Reka Flash 3

Reka Flash 3: 21B open-source reasoning model under Apache 2.0

Reka AI open sourced Reka Flash 3, a 21B parameter reasoning model released under an Apache 2.0 license and trained with the REINFORCE Leave One-Out (RLOO) reinforcement learning technique. It excels at chat, coding, instruction following, and function calling, with Nisten calling it possibly one of the best ~20B models available.

Remade AI
New ModelsOpen weights

Wan 2.1 14B I2V LoRA video effects

Remade AI releases 8 open LoRA video effects for Wan 2.1

Remade AI published eight LoRA video effects for Alibaba's Wan 2.1 14B image-to-video model, including effects like squish, inflate, deflate, and cakeify. The open release shows video effects becoming trainable and customizable via LoRAs on top of open video models.

Alibaba (Qwen)
New ModelsOpen weights

QwQ-32B

Qwen releases QwQ-32B reasoning model that matches R1 on some evals

Alibaba's Qwen team released QwQ-32B, an open-weights reasoning model that matches DeepSeek R1 on several evals despite being roughly 20x smaller at 32B parameters. Qwen tech lead Junyang Lin joined the show to announce it, and the episode dubbed it Alibaba's 'R1 killer' for bringing strong reasoning to a size that runs on consumer hardware.

February 2025

DeepSeek
Dev ToolsOpen weights

Open Source Week infra releases

DeepSeek open-sources its infra stack during Open Source Week

DeepSeek ran its Open Source Week, releasing a series of production infrastructure repos (including FlashMLA, DeepEP, and DeepGEMM) that power its training and inference stack. The drops gave the open-source community a rare look at the low-level kernels and communication libraries behind DeepSeek's efficient frontier models.

Microsoft
New ModelsOpen weights

Phi-4-multimodal

Microsoft releases Phi-4-multimodal and Phi-4-mini open weights

Microsoft expanded the Phi family with Phi-4-multimodal-instruct, a small open-weights model that handles text, vision, and audio in a single model, alongside a compact Phi-4-mini. The weights shipped on Hugging Face, continuing Microsoft's push for capable small models that can run on-device.

Arc Institute & NVIDIA
New ModelsOpen weights

Evo 2

Arc Institute and NVIDIA release Evo 2, a 40B state-of-the-art genomics model

Arc Institute and NVIDIA introduced Evo 2, a state-of-the-art genomics model with around 40 billion parameters trained on 9.3 trillion nucleotides. It uses the StripedHyena architecture to process genetic sequences up to 1 million nucleotides, enabling prediction of genetic mutation effects and even design of entire genomes. Fully open: two papers, weights, data, and training and inference codebases.

Haize Labs
Dev ToolsOpen weights

Verdict

Haize Labs open-sources Verdict, a framework for composing LLM judges

Haize Labs released Verdict, an open-source framework for composing LLM judges that tackles core LLM-as-a-judge problems: self-preference bias, prompt sensitivity, and meta-evaluation. Verdict combines simpler judging primitives into more robust and efficient evaluators ('judge-time compute scaling'), achieving near state-of-the-art results on benchmarks like ExpertQA at a fraction of the cost, fast enough to use as a real-time guardrail. Co-founders Leonard Tang and Nimit joined the show to discuss it.

Hao AI Lab
Dev ToolsOpen weights

FastVideo

Hao AI Lab's FastVideo makes HunyuanVideo 3x faster with no extra training

Hao AI Lab released FastVideo, a method that makes HunyuanVideo (HY-Video) three times faster with no additional training, using a technique called Sliding Tile Attention that outperforms even flash attention for this workload. Faster inference makes open-source video models far more practical, and it supports HY-Video LoRAs for fine-tuned applications.

Perplexity
New ModelsOpen weights

R1-1776

Perplexity releases R1-1776, a censorship-free DeepSeek R1 fine-tune

Perplexity open-sourced R1-1776, a fine-tuned version of DeepSeek R1 designed to remove Chinese government censorship on topics like Tiananmen Square and Taiwanese independence. They used human experts to identify around 300 sensitive topics and built a censorship classifier to train the bias out, claiming no significant impact on standard eval performance. The name 1776 is a nod to American independence.

StepFun
New ModelsOpen weights

Step-Video-T2V

StepFun open-sources Step-Video-T2V, a SOTA 30B text-to-video model

StepFun released Step-Video-T2V (plus a T2V Turbo variant), a 30 billion parameter state-of-the-art text-to-video model under an MIT license. Results impressed especially on text integration, such as rendering 'We will open source' on a scroll as a character unfurls it, marking one of the strongest open-source video drops of the week.

January 2025

Alibaba (Qwen)
New ModelsOpen weights

Qwen2.5-VL

Alibaba ships Qwen2.5-VL open vision-language model family

Alibaba's Qwen team released Qwen2.5-VL, open-weights vision-language models up to 72B that handle images, documents, video understanding, and on-screen agentic grounding. The 72B Instruct model was immediately available on Hugging Face and in Qwen Chat.

72B Largest variant
Browser Use
Dev ToolsOpen weights

Browser-use

Browser-use: open-source alternative to OpenAI's Operator

Browser-use is an open-source library that lets LLM agents control a real web browser, positioned on the show as the OSS counterpart to OpenAI's Operator. It enables anyone to build browsing agents with their model of choice instead of a closed hosted product.

New ModelsOpen weights

YuE 7B

YuE 7B: open-source Suno-style music generation model

The Multimodal Art Projection (M-A-P) team released YuE, a 7B open-source music generation model dubbed the 'open Suno' on the show, capable of generating full songs with vocals from lyrics. Weights are on Hugging Face with code on GitHub and a hosted demo on fal.ai.

7B Parameters
Mistral AI
New ModelsOpen weights

Mistral Small 2501

Mistral Small 2501: 24B open-weights model under Apache 2.0

Mistral AI released Mistral Small 2501, a 24B-parameter instruct model under the permissive Apache 2.0 license. Announced as breaking news during the show, it continues Mistral's tradition of strong small open models suitable for fine-tuning and local deployment.

24B Parameters
UC Berkeley
Papers & ResearchOpen weights

TinyZero & RAGEN

Berkeley TinyZero and RAGEN replicate DeepSeek R1-Zero

Berkeley researchers released TinyZero and RAGEN, open replications of DeepSeek's R1-Zero reinforcement-learning recipe on small models. The projects showed that R1-style emergent reasoning behavior can be reproduced cheaply, with training runs logged publicly on Weights & Biases.

DeepSeek
New ModelsOpen weights

DeepSeek R1

DeepSeek R1: MIT-licensed open source reasoning model rivals o1

DeepSeek released R1, a state-of-the-art open source reasoning model under a permissive MIT license. It matches or beats OpenAI's o1 on key reasoning benchmarks while being fully open weights, and DeepSeek also shipped a family of distilled smaller models. The show called this the hottest week open source AI has ever had.

Hugging Face
New ModelsOpen weights

SmolVLM (256M)

Hugging Face SmolVLM: tiny vision-language models run on WebGPU

Hugging Face released SmolVLM, a family of tiny vision-language models including a 256M-parameter version small enough to run entirely in the browser via WebGPU. It demonstrates how far efficient multimodal models have shrunk while remaining usable.

256M Parameters (smallest VLM)