Google drops Gemma 4 12B, an encoder-free multimodal local model
Google released Gemma 4 12B, an encoder-free multimodal model under Apache 2.0 that targets 16GB VRAM local setups. Instead of bolting separate vision or audio encoders onto a language model, it uses one unified network, which LDJ and Yam argued makes smaller multimodal models cheaper, cleaner, and easier to run locally.
H Company launches Holo 3.1 local computer-use agent models
H Company released Holo 3.1, a family of local computer-use agent models ranging from 0.8B to 35B parameters with new quantized checkpoints. The lineup targets running screen-driving agents on local hardware rather than in the cloud.
Ideogram 4.0 becomes the top open-weight text-to-image model
Ideogram released Ideogram 4.0, a 9.3B-parameter text-to-image model with open weights under a non-commercial license. It leads open-weight image models on typography and layout, with bounding-box/layout-style prompting that trades casual generation ease for precise structured control.
JetBrains open-sources Mellum 2, a 12B MoE coding model
JetBrains released Mellum 2, a 12B mixture-of-experts coding model with only 2.5B active parameters, trained from scratch by a small team using a three-stage curriculum over 10T tokens. The panel read it as IDE companies converting years of developer-workflow context into model advantage; it is also available on CoreWeave Inference.
Nous Research launches Hermes Desktop agent app for Mac/Win/Linux
Nous Research launched Hermes Desktop, packaging the Hermes Agent harness into a native desktop app for Mac, Windows, and Linux. Karan previewed chat, permissions, tool-call visibility, reasoning traces, and admin controls aimed at small teams, startups, and personal agent fleets.
NVIDIA ships Nemotron 3.5 ASR, a 600M streaming speech model
NVIDIA released Nemotron 3.5 ASR, a 600M-parameter open multilingual streaming speech-to-text model aimed at voice agents. It supports 40 languages and reportedly delivers 17x more throughput than Parakeet-style baselines at half the size, pushing the latency/accuracy frontier for open voice-agent infrastructure.
NVIDIA releases Nemotron 3 Ultra, a 550B open-weight MoE for agents
NVIDIA dropped Nemotron 3 Ultra the day of the show, a 550B-parameter sparse MoE with 55B active parameters built for long-running agentic harnesses like OpenCode, Hermes, and OpenClaw. Chris Alexiuk joined to explain the hybrid Mamba/Transformer architecture and the unusually complete open release: weights, training data, recipes, a GenRM reward model, and an NVFP4 quantized checkpoint.
550B Nemotron 3 Ultra parameters55B Active parameters
OpenBMB MiniCPM5-1B: new SOTA 1B open-weights model
OpenBMB released MiniCPM5-1B, a state-of-the-art 1B-parameter open-weights model for efficient local and on-device use that runs on a phone. It scores 17.9 on the Artificial Analysis Intelligence Index, 7.4 points ahead of its size class, while using roughly 31x fewer output tokens than Qwen3.5 2B.
MOSS-TTS-v1.5: open-source 8B TTS with 31 languages
OpenMOSS shipped MOSS-TTS-v1.5, an 8B open-source text-to-speech model supporting 31 languages with pause control, released under Apache 2.0. It is one of the larger fully open TTS models available.
PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB
PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.
Tencent open-sources Hy-MT2 translation models under Apache 2.0
Tencent released the Hy-MT2 family of translation models under Apache 2.0, including a tiny 1.8B model that beats paid translation APIs like Microsoft's Translator, plus a larger 30B-A3B MoE variant. A small, free, locally-runnable model outperforming commercial translation services was one of the open-source wins of the week.
Cohere releases Command A+, a 218B Apache 2.0 MoE with 25B active params
Cohere released Command A+, a 218B-parameter mixture-of-experts model with 25B active parameters, shipping open weights under Apache 2.0. It was the week's headline open-source release, available on Hugging Face in both W4A4 quantized and BF16 variants.
Nous Research publishes Lighthouse Attention for fast long-context pretraining
Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining that delivers major speedups. The release includes a blog post, an arXiv paper and an open-source GitHub implementation.
Fastino Labs GLiGuard: 300M open guardrail model matches SOTA safety models
Fastino Labs released GLiGuard, a 300M-parameter open source guardrail model that matches state-of-the-art safety models 23-90x its size while delivering 16x higher throughput. It ships under Apache 2.0, making small, fast, deployable guardrails available to everyone.
Meta Sapiens2: family of 6 human-centric vision models (0.1B-5B)
Meta released Sapiens2, a family of six ViT models ranging from 0.1B to 5B parameters trained on 1 billion human images. The models set SOTA on human-centric vision tasks including pose estimation, segmentation, surface normals, and pointmaps, with weights on Hugging Face.
DeepSeek V4: 1.6T MoE with CSA+HCA attention and 1M context
DeepSeek released the V4 paper and models (V4-Pro and V4-Flash on Hugging Face), a 1.6T-parameter MoE featuring CSA+HCA attention that fits 1M tokens of context in just 5.7GB of KV cache. It is possibly the first frontier model trained across multiple datacenters, and DeepSeek is offering API tokens at an 80% discount on already much cheaper pricing.
IBM Granite 4.1: dense non-thinking models with top tool calling
IBM released the Granite 4.1 family (3B/8B/30B), dense non-thinking models under Apache 2.0 with best-in-class tool calling, scoring 73 on BFCL with just 8B parameters. IBM claims 20x token efficiency over Qwen3.5 9B, and the models are live on W&B Inference at $0.05/$0.10 per million input/output tokens with 128K context.
Mistral Medium 3.5: 128B dense flagship with 256K context
Mistral launched Medium 3.5, a 128B dense flagship model with 256K context and configurable reasoning, released with weights on Hugging Face. Alongside it Mistral shipped a Vibe coding agent.
NVIDIA released Nemotron 3 Nano Omni, a 30B-total/3B-active hybrid Transformer-Mamba MoE with 256K context. It delivers 9x throughput on consumer hardware.
SenseTime open-sourced SenseNova U1, a unified multimodal MoE model with 8B total and 3B active parameters that handles understanding and generation with no separate encoder or VAE. The architecture builds on a paper the team presented at ICLR last year.
Talkie: 13B open-weight LLM trained only on pre-1930 text
Alec Radford and David Duvenaud released Talkie, a 13B open-weight LLM trained exclusively on pre-1930 text. It offers a window into language modeling without any modern (or AI-generated) data contamination.
Qwen3.6-27B: dense Apache-2.0 model beats Alibaba's own 400B flagship
Alibaba shipped Qwen3.6-27B, a dense 27B-parameter model under Apache 2.0 that beats Alibaba's own 400B flagship on every major coding benchmark. Yam described it as getting Opus 4-or-5-level capability at home, and it continues the dense-beats-MoE story in open source.
Brex open-sources CrabTrap, an LLM-as-judge proxy for agent security
Brex's CEO pair-programmed with Codex and open-sourced CrabTrap, an LLM-as-judge HTTP proxy that intercepts outbound agent requests and blocks risky activity using natural-language rule definitions. Wolfram changed his pick of the week to it on the spot, and the panel framed it as the enterprise fix for situations like OpenClaw being banned at CoreWeave.
Kimi K2.6: 1T MoE open-source SOTA on SWE-Bench Pro
Moonshot AI released Kimi K2.6, a 1-trillion-parameter MoE with 32B active parameters, 384 experts, MLA attention, and a 256K context window under a modified MIT license. It claims open-source state of the art on SWE-Bench Pro at 58.6, and Wolfram called it the best open-source model he has ever tested on his private wolf-bench.
OpenAI open-sources a 1.5B privacy/PII filter that runs in the browser
OpenAI open-sourced a tiny 1.5B MoE model with only 50M active parameters under Apache 2.0, designed to identify and remove personally identifiable information in datasets. It runs fully in the browser on WebGPU via Xenova's Transformers.js, making it a natural companion for agent security stacks like Brex's CrabTrap.
Community researcher 0xSero released Gemma 4 21B-A4B REAP, a 20% expert-pruned version of the Gemma 4 26B MoE created using Cerebras' REAP pruning technique. It shrinks the model for cheaper local inference while preserving most of its quality.
Qwen 3.6-35B-A3B: Apache 2.0 MoE with 3B active hits 73.4% SWE-Verified
Alibaba Qwen open-sourced Qwen 3.6-35B-A3B under Apache 2.0 the same morning Opus 4.7 dropped: a 35B MoE with only 3B active parameters that scores 73.4% on SWE-bench Verified, rivaling models 10x its size. It is natively multimodal with 262K context extensible to 1M, and the crew called it the strongest mid-size LLM on nearly all benchmarks, putting to rest doubts about Qwen's open-source commitment after Junyang Ling's departure.
Baidu ERNIE-Image: 8B DiT ranks #1 on GenEval among open models
Baidu released ERNIE-Image, an 8B diffusion transformer that ranks #1 on GenEval among open models and features precise multilingual text rendering. It is part of this week's wave of Chinese open releases in image and 3D generation.
Gradient Bang: first massively multiplayer fully LLM-driven voice game
Kwindla Kramer's 'side project that broke containment' is a fully LLM-driven multiplayer voice-based space game inspired by BBS-era Trade Wars, built on a new Pipecat Sub-Agents library with a class-based event bus that works locally and over the network. A Deepgram plus GPT-4.1 voice agent always responds in under 1.5 seconds while GPT-5.2 medium-thinking task agents do the work, and the React frontend is rendered from LLM-generated JSON as dynamic UI. The team also open-sourced GB Benchmarks for evaluating agent task execution.
Super Gemma 4 26B Uncensored v2 trends on HF with 0/100 refusals
Community fine-tuner @songjunkr released Super Gemma 4 26B Uncensored v2, which is trending on Hugging Face with 0/100 refusals and fixed tool calling. It ships in GGUF and MLX 4-bit variants for local inference.
Marimo released Marimo Pair, which embeds Claude Code, Codex, or OpenCode agents directly inside its reactive, dependency-graph-aware Python notebooks. Founding engineer Trevor Manz joined the show to explain why reactive notebooks are a natural verification surface for agent-written code; the launch trended on Hacker News this week and was featured as part of This Week's Buzz (Marimo is in the CoreWeave family).
NVIDIA Lyra 2.0: single image to explorable 3D worlds, Apache 2.0
NVIDIA released Lyra 2.0 under Apache 2.0, generating persistent, explorable 3D worlds from a single image. Together with Baidu ERNIE-Image and Tencent HYWorld 2.0, it rounds out a week of open releases in the 3D-world-from-single-image race.
Tencent HYWorld 2.0 turns a single image into editable 3D scenes
Tencent released HYWorld 2.0, which converts a single image into editable 3D Gaussian Splats and meshes that are ready for Unity, Unreal, and Isaac Sim. It is one of three single-image-to-3D-world releases this week, essentially an open-source equivalent of what Fei-Fei Li's World Labs is building.
Gemma 4 goes live on W&B Inference with LoRA inference support
Weights & Biases put Gemma 4 live on W&B Inference, running on CoreWeave infrastructure with LoRA inference support. Replying to the W&B announcement post on X with the code 'Gem Drop' gets $20 in free inference credits.
Arena releases 3 years of leaderboard data and prompts on Hugging Face
Arena (formerly LMArena) released three years of historical leaderboard data plus the actual user prompts as datasets on Hugging Face. Peter Gostev, who previously scraped the site by hand into Google Sheets for his charts, now builds his Compute Wars and model-trend analyses straight from the data.
MemPalace open-source AI memory system goes viral with 26K stars
MemPalace, the open-source AI memory system from Milla Jovovich and Ben Sigman, went viral with 26K GitHub stars in 2 days and claimed top memory-benchmark scores. The team then transparently walked back the overstated benchmark claims in a public correction thread, which the show called a refreshingly honest arc.
Nous Research ships Hermes 27B, paired with the Hermes harness
Nisten's pick of the week: Hermes 27B, an open model trained specifically to be paired with the Hermes harness and allegedly distilled from the Opus API. Model and harness ship together as a portable unit, a notable take on the harness-engineering trend Swyx discussed.
OpenClaw's biggest release since 4.0: /dreaming goes GA with Light/Deep/REM memory consolidation phases that defrag agent memory into a human-readable Dream Diary (DREAMS.md). The release also adds built-in video and music generation across 4 backends, GPT-5.4 as the new default model, prompt-cache reuse improvements, and Control UI plus docs in 12 new languages. Maintainer Vincent Koc says the ~1.5M-line codebase was refactored into a plugin architecture in nine days.
GLM-5.1 takes #1 open-source spot on SWE-Bench Pro at 58.4%
Z.ai released GLM-5.1, now the #1 open-source model on SWE-Bench Pro at 58.4%. It can run autonomously for 8 hours with 1,700+ agent steps, and is already live on W&B Inference. Open weights are up on Hugging Face alongside an arXiv paper.
Alibaba open-sources Qwen3.5-Omni, a 397B native omni-modal model
Qwen3.5-Omni is Alibaba's natively omni-modal open model handling text, image, audio, and video, with 397B total parameters and 17B active. It extends the Qwen family's open-source momentum into unified multimodal workloads.
Google releases Gemma 4 open-weights family under Apache 2.0
Google DeepMind's Gemma 4 launch crossed 10M+ downloads with over 1,000 Gemma-4-based fine-tunes on Hugging Face; the Gemma family totals 500M+ downloads. Omar Sanseviero says Gemma is the foundation for the next generation of Gemini Nano shipping on Pixel and Samsung, with the AI Edge gallery letting people run it locally on Android and iOS. It punched above its size on Arena's Pareto curve and is now live on W&B Inference.
Liquid AI ships LFM2.5-350M with agentic tool calling at 350M params
Liquid AI released LFM2.5-350M, a 350M-parameter open model that does agentic tool calling and fits under 500MB quantized. It targets edge and on-device agent workloads where tiny deployable models matter.
PrismML releases Bonsai 1-bit models, an 8B model in 1.15 GB
PrismML released Bonsai, a family of 1-bit quantized open models fitting an 8B model into 1.15 GB and claiming 10x intelligence density, built on decades of compression research. The panel discussed one-bit quantization as a cost/performance lever for cheap local inference.
Ryan Carson open-sources Claw Chief, an AI chief of staff
Co-host Ryan Carson open-sourced Claw Chief, an AI chief-of-staff setup with skills, crons, and scheduling. It packages his agent workflow patterns into a reusable open-source repo.
Claw-code clean-room rewrite becomes fastest repo to 100K GitHub stars
After Claude Code's source leaked via npm, Sigrid Jin and Bellman published claw-code, a clean-room rewrite that became the fastest GitHub repo to pass 100K stars, hitting the mark in roughly 24 hours. Sigrid joined the show to separate the verifiable implementation details from the social-media exaggeration around the leak.
Irodori-TTS-500M: open Japanese TTS with emoji emotion control
Irodori-TTS-500M is a 500M-parameter open-weights Japanese text-to-speech model released on Hugging Face, notable for controlling emotional delivery through emojis in the input text. It landed as part of the week's wave of voice and audio releases.
Cohere Transcribe: open-source 2B ASR tops Open ASR Leaderboard at 5.42% WER
Cohere entered the ASR game with Transcribe, a 2-billion-parameter Apache 2.0 speech recognition model that immediately took the number-one spot on Hugging Face's Open ASR Leaderboard with a 5.42% word error rate versus Whisper Large v3's 7.44%. It wins 61% of human evaluations on average and 64% head-to-head against Whisper, making it a credible local-inference Whisper replacement for regulated industries.
2B Cohere Transcribe ASR size5.42% Word error rate on Open ASR Leaderboard
MiniMax 2.7 open-source weights discussed as small-model momentum continues
The panel covered MiniMax 2.7 and its open-weights release in the context of small, efficient models becoming genuinely practical for local and specialized agent workflows. The segment focused on capability momentum and how open-weights expectations keep shaping adoption sentiment.
Mistral drops Voxtral TTS, a 3B open-weight text-to-speech model
Mistral released Voxtral TTS, its first text-to-speech model, as breaking news during the live show: 3 billion parameters, open weights, with emotion controls for neutral, happy, and frustrated voices. Mistral claims it beats ElevenLabs Flash v2.5 in human preference tests with a 58% win rate on flagship voices and 68% on zero-shot voice cloning, though Alex's live test found it decent rather than stunning.
Reka AI ships Edge, a 7B multimodal VLM for sub-second on-device inference
Reka AI launched Reka Edge, a 7B-parameter multimodal vision-language model built for sub-second latency on edge devices. Weights are on Hugging Face and the model is available through OpenRouter, with the panel highlighting it as a notable efficient multimodal release for real-world deployment.
H Company's Holotron-12B: hybrid SSM computer-use model at 8.9k tok/s
H Company released Holotron-12B, an open-source hybrid SSM model built for computer-use agents. It claims 8,900 tokens/sec generation speed and jumps the WebVoyager benchmark from 35.1% to 80.5%, continuing the trend of hybrid SSM architectures for long-context agent workloads.
Hugging Face report: China passes US in LLM count, Qwen tops 1B downloads
Hugging Face published its Spring 2026 State of Open Source report showing China surpassing the US in number of LLMs for the first time, with Chinese models taking 41% of all downloads. Alibaba's Qwen family crossed 1 billion total downloads (about 1 million per day), overtaking Llama as the most downloaded model family, on a platform now hosting 11M users and 2M+ models.
MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro
MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.
Mistral Small 4: 119B MoE with 6B active unifies vision, coding, reasoning
Mistral returned to open source with Small 4, a 119B-parameter MoE with 128 experts and only 6B active per token, released under Apache 2.0. It unifies the previous Pixtral (vision), Devstral (coding), and Magistral (reasoning) lines into one model and can fit on a single H100 when compressed. Early WolfBench results are sobering at ~17% on OpenClaw agent tasks, roughly on par with similarly sized Nemotron.
Mamba-3 lands with three SSM innovations for inference-first linear models
Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.
Unsloth Studio: web UI for local fine-tuning with 2x speed, 70% less VRAM
Unsloth launched Studio, an open-source web UI for local LLM training and inference claiming 2x speed and 70% less VRAM, supporting 500+ models across text, vision, audio, and embeddings. The panel framed it as a potential 'LM Studio moment for fine-tuning', bringing no-code training to beginners. Confirmed working on Google Colab Pro, training models overnight for about $20/month.
Fish Audio S2 is a fully open-source TTS model with inline emotion control via free-text bracket tags like gasp, laughter, and long pause. Alex demoed it live with an OpenClaw skill that let his 5-year-old talk to a voice clone of 'Rocky' from Project Hail Mary; Wolfram called it 'ElevenLabs V3 for free.'
Lightricks ships open-source LTX Video 2.3, runs on an RTX 3090
Lightricks released LTX Video 2.3, an open-source video generation model with improved motion, audio, and quality that runs on a single RTX 3090. It is available on GitHub and Hugging Face.
MiroThinker-1.7 open-source research agent hits SOTA
MiroMind released MiroThinker-1.7, an open-source deep-research agent model that reaches state of the art on deep research benchmarks. It was covered alongside NVIDIA's Nemotron launch in the open-source segment.
NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet
NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.
120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)
Paperclip.ing: open-source agent orchestration for zero-human companies
Anonymous builder DOTTA presented Paperclip.ing, an open-source agent orchestration framework for 'zero human companies' where an AI CEO recursively hires more agents. It hit 20K GitHub stars in its first week, with a heartbeat system driving agent autonomy and a Memento-style memory architecture keeping agents coherent across tasks.
Covenant-72B: a decentralized-trained open 72B LLM
Covenant-72B is a decentralized 72B-parameter open LLM, released and shared via Hugging Face. It was highlighted in the open-source segment as an example of decentralized model training.
Alibaba releases Qwen3.5 small models (2B, 4B, 9B) for local use
Alibaba released the Qwen3.5 small model series with 2B, 4B, and 9B variants, which the panel found highly usable on consumer hardware. The release landed alongside leadership turbulence as Junyang Lin and Binyuan Hui departed Qwen, though the panel expects Alibaba's open-source momentum to continue.
Yuan AI Lab releases Yuan 3.0 Ultra open-weights model
Yuan AI Lab (IEIT) released Yuan 3.0 Ultra, a new open-weights model published on Hugging Face under the IEITYuan org. It was covered in the open-source LLM roundup as part of a busy week for Chinese open model releases.
StepFun open-sources Step 3.5 Flash Base with its training stack
StepFun released Step 3.5 Flash Base and Midtrain checkpoints, an unusually open release that includes training artifacts and the SteptronOSS training stack alongside the weights. The panel praised the Apache-2 orientation and called the continuation-pretraining flexibility a major practical unlock for builders.
Qwen 3.5 lands: 35B/3B-active Medium outperforms the old 235B flagship
Alibaba released the Qwen 3.5 family of open-weight models, headlined by Qwen3.5-35B-A3B, a 35B model with only 3B active parameters that outperforms their previous 235B flagship. Variants include a 122B-A10B and a dense 27B, with the panel highlighting the hybrid state-space (Mamba-layer) architecture and strong practical coding and agent performance at a tiny active-parameter footprint.
Liquid AI releases LFM2-24B-A2B, a laptop-friendly 24B MoE
Liquid AI released LFM2-24B-A2B, a 24B mixture-of-experts model with only 2.3B active parameters that runs on consumer laptops. The panel highlighted its speed and surprisingly strong non-coding reasoning, reinforcing the trend of efficient low-active-parameter open models for local use.
Perplexity launches pplx-embed SOTA embedding models
Perplexity released pplx-embed, a family of state-of-the-art embedding models built for web-scale retrieval. The models are available on Hugging Face and through Perplexity's API with quickstart docs.
Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.
Alibaba opens Qwen 3.5: 397B-param multimodal MoE with only 17B active
Alibaba released Qwen3.5-397B-A17B, billed as the first open-weight native multimodal MoE model, with 397B total parameters, just 17B active, 512 experts, and 262K native context extendable to 1M. It delivers 8.6-19x faster inference than Qwen3-Max and continues Qwen's strength in multilingual and medical tasks, scoring 52.5% on Terminal Bench, third place among open-source models. Nisten found coding still trails GLM-5.
Cohere Labs releases Tiny Aya, a 3.35B multilingual model for 70+ languages
Cohere Labs released Tiny Aya, a 3.35B-parameter multilingual model family supporting 70+ languages that is small enough to run locally on phones. It extends Cohere's Aya line of open multilingual models, bringing broad language coverage to on-device deployments.
Weights & Biases launched Kimi K2.5 on its inference service, making Moonshot AI's model available to W&B users. In Wolfram's Terminal Bench deep dive for W&B, Kimi K2.5 achieved a 67.4% ceiling score across multiple runs, among the strongest open-model results he measured.
Zyphra opens ZUNA, a 380M-param EEG brain-computer interface model
Zyphra released ZUNA, a 380M-parameter open-source BCI foundation model that translates EEG brain signals into text, reconstructing clinical-grade brain signals from sparse, noisy data. Dubbed 'thought to text' by the community, it works with roughly $500 non-invasive EEG headsets, likely needs personalized training per user, and is small enough to run in real time on a consumer gaming GPU. It is Apache licensed.
MiniMax M-2.5 hits 80.2% SWE-Bench Verified with 10B active params
MiniMax dropped M-2.5 thirty minutes before the show: a 200B-total, 10B-active open-weights model scoring 80.2% on SWE-Bench Verified, approaching Opus 4.6 at roughly 1/20th the cost (~15 cents per task with a 57% win rate over Opus). Trained with MiniMax's decoupled Forge RL framework and optimized for end-to-end task time with fewer tool calls and thinking tokens. Senior researcher Olive Song joined live and revealed the model was still training — they cut a checkpoint for early release.
W&B Inference adds day-zero GLM-5 and Kimi K2.5 support
Weights & Biases launched day-zero GLM-5 support on its CoreWeave-powered W&B Inference service, alongside Kimi K2.5, with MiniMax 2.5 coming soon. Alex announced $50 in free credits for listeners to test the new open-weights models.
Z.ai launches GLM-5, the open-weights agentic coding crown
Z.ai released GLM-5, a 744B-parameter MoE model (40B active) trained on 28.5 trillion tokens that takes the #1 open-source ranking for agentic coding with 77.8% SWE-bench Verified. It introduces the SLIM asynchronous RL framework for post-training, adopts DeepSeek's sparse attention to cut deployment cost, and was trained on Huawei chips rather than NVIDIA. Lou from Z.ai joined the show live and summed it up as bigger, faster, better, and cheaper.
ACE-Step 1.5: open-source 'Suno at home' music generation under MIT
ACE-Step 1.5 is an MIT-licensed AI music generator that produces full songs in under 10 seconds on consumer GPUs and runs on a MacBook. The panel demoed it live via Pinocchio, generating a ThursdAI song on the spot, and it is available for one-click install.
Qwen3-Coder-Next hits 70.6% SWE-Bench Verified with 3B active params
Alibaba's Qwen3-Coder-Next is an 80B MoE coding agent model with only 3B active parameters that scores 70.6% on SWE-Bench Verified and 44% on the much harder SWE-Bench Pro. It was trained on 7.5T tokens with 20,000 parallel RL environments and runs under 48GB of RAM with GGUF quantization, making near-frontier agentic coding feasible on local hardware.
LingBot-World: open-source world model challenges Google Genie 3
Ant Group released LingBot-World, an open-source world model that generates 10-minute playable environments at 16fps. It positions open weights as a direct challenger to Google's closed Genie 3 in interactive world generation.
Intern-S1-Pro: 1 trillion parameter open MoE for scientific reasoning
InternLM released Intern-S1-Pro, a 1 trillion parameter open-source MoE model targeting SOTA scientific reasoning across chemistry, biology, materials, and earth sciences. The panel noted it beats frontier models on science benchmarks, a massive compute investment for an open release.
Mistral's Voxtral Transcribe 2 dethrones Whisper as SOTA transcription
Mistral AI launched Voxtral Transcribe 2, state-of-the-art speech-to-text with sub-200ms latency, native diarization support, and open weights under Apache 2.0. The panel called it the first model to dethrone Whisper after roughly three years, and Alex used it to transcribe this very episode.
MiniCPM-o 4.5: first open-source full-duplex omni model
OpenBMB released MiniCPM-o 4.5, the first open-source full-duplex omni-modal LLM that can see, listen, and speak simultaneously. It can listen while speaking and even interrupt the user, bringing real-time conversational behavior to open weights.
StepFun Step 3.5 Flash: frontier reasoning claims at 11B active params
StepFun released Step 3.5 Flash, a 196B sparse MoE model with only 11B active parameters, claiming frontier-level reasoning while generating at 100-350 tokens per second. It continues the trend of sparse Chinese MoE models delivering high speed at low active parameter counts.
Z.ai released GLM-OCR, a tiny 0.9B parameter document understanding model that achieves the #1 ranking on OmniDocBench V1.5. It shows that strong OCR and document parsing no longer require large models.
Alibaba's Tongyi Lab released Z-Image, a new image generation model, with support landing in the open-source DiffSynth-Studio toolkit on GitHub. Covered in the AI Art segment alongside HunyuanImage 3.0.
Arcee AI ships Trinity Large: 400B MOE trained in 33 days for $20M
Arcee AI's Trinity Large is a 400B-parameter MOE with 13B active parameters, trained on 17T tokens across 2000 B300 GPUs in 33 days for $20M. It has 512K native context (twice Kimi K2.5), is free on OpenRouter until February 2026, and the panel called it the largest Western open-source lab model.
Jan AI releases Jan v3, a 4B model built for fast local inference
Jan v3 is a 4B-parameter open model optimized for local inference, hitting 132 tokens/sec with a 262K context window and a 40% improvement on coding. The Jan desktop app it powers has reached 5M downloads.
Moonshot AI releases Kimi K2.5, the new open-source king
Moonshot AI's Kimi K2.5 takes the open-source crown, becoming the most-used model on OpenRouter and topping open-source leaderboards. The panel highlighted its strong agentic coding performance and tool use.
Qwen3-TTS: open-source TTS family with 97ms latency and voice cloning
Alibaba's Qwen team released Qwen3-TTS, a full open-source text-to-speech family under Apache 2 that dropped 30 minutes before the show. It spans 5 models from 0.6B to 1.7B parameters, with 97ms latency, voice cloning from just 3 seconds of audio, voice description prompting, and 10-language support.
FlashLabs Chroma 1.0: open-source real-time speech-to-speech under 150ms
FlashLabs released Chroma 1.0, billed as the world's first open-source end-to-end real-time speech-to-speech model with voice cloning under 150ms latency. The 4B parameter model is built on Qwen 2.5 Omni and released under Apache 2; its live demo with RAG and document upload impressed the whole panel.
Liquid AI's LFM2.5-1.2B-Thinking: on-device reasoning under 900MB
Liquid AI released LFM2.5-1.2B-Thinking, a 1.2B parameter reasoning model that runs entirely on-device with under 900MB of memory. Its hybrid architecture with gated convolutions delivers 239 tokens/sec on an AMD CPU and 82 tokens/sec on a mobile NPU, making it practical for edge devices, Raspberry Pi, and older iPhones.
Clawdbot: open-source self-improving personal AI assistant for macOS
Clawdbot, created by Peter Steinberger, is an open-source personal AI assistant that runs locally on your Mac and connects via WhatsApp, Telegram, or Discord. Its killer feature is self-improvement: ask it to learn something and it writes its own skill files, giving a single chat conversation control over multiple agents, persistent memory, voice messages, image generation, and browser automation on your actual computer.
GLM-4.7-Flash: 30B MoE local coding agent with only 3B active params
Z.AI released GLM-4.7-Flash, a 30B parameter MoE model with only 3B active parameters, designed as the ultimate local coding and agent assistant. It hits 59% on SWE-Bench Verified (approaching Sonnet 4's 64%) and runs at 120 tokens/sec on a stock Mac Studio M3 Ultra, fast enough to run RALF autonomous coding loops even on CPU.
59% SWE-Bench Verified120 tps Speed on Mac Studio M3 Ultra
Black Forest Labs drops Flux 2 Klein, fast open-weights image model
Wolfram broke the news mid-show: Black Forest Labs released Flux 2 Klein, a fast 4B/9B image generation model with open weights under Apache 2.0. It is designed for near-real-time editing and style iteration, and Alex used it minutes later in his live Claude Cowork demo.
M3: 235B open-source medical LLM claims to beat GPT 5.2 on HealthBench
Byte released M3, a 235B parameter medical LLM fine-tuned from Qwen3 and licensed Apache 2.0. With only 22B active parameters, it is runnable at usable speeds on an M3 Ultra, and it claims to beat GPT 5.2 on HealthBench. Nisten suggested pairing it with smaller imaging models like MedGemma rather than treating them as substitutes.
Chorus adds agent skills support for every LLM via OpenRouter
Alex used a Ralph loop with Claude Code to add full agent skills support to Chorus, the open-source app that compares answers across multiple LLMs, in about 3.5 hours. The work added a settings panel, filesystem skill discovery, front-matter parsing, and cross-model skill injection, letting the same Claude-style skills run on GPT 5.2 Codex, Gemini, and any OpenRouter model.
Google releases MedGemma 1.5 for offline medical imaging
Google released MedGemma 1.5, a small (4B-class) open model for medical use cases, compact enough to run offline for medical imaging. The panel stressed it is a different model class from Byte's giant M3 medical LLM and that the two pair well together rather than replacing each other.
Meituan's LongCat Flash Thinking: 560B MoE with 27B active, MIT licensed
Meituan released LongCat Flash Thinking, an open-source reasoning MoE with 560B total parameters and only 27B active, under an MIT license. It continued the run of large sparse Chinese open-weights models offering frontier-style reasoning at low active-parameter cost.
Lightricks open-sources LTX-2 synchronized audio-video model
Lightricks open-sourced LTX-2, billed as the first truly open audio-video generation model with synchronized audio and video output, releasing full training code alongside the weights. A distilled version is available to try on Replicate.
Liquid AI LFM 2.5: 1B on-device family with end-to-end audio
Liquid AI released LFM 2.5, a family of ~1.2B parameter on-device models spanning text, vision, and audio, announced at CES alongside AMD's Lisa Su. The models hit 239 tokens/sec on AMD CPU and 100 tokens/sec on iPhone 16 Pro Max, and include a revolutionary end-to-end audio model that skips the traditional ASR-LLM-TTS pipeline entirely, running in as little as 8GB of RAM.
MiroMind AI released MiroThinker 1.5, a 30B parameter open source search agent that achieves 56.1% on BrowseComp and 66.8% on BrowseComp Chinese, outperforming trillion-parameter models. It introduces 'interactive scaling' as a third scaling dimension beyond parameters and context, and is a fine-tune of Qwen 3 Thinking with 147K open training samples.
NousCoder 14B: 7% LiveCodeBench jump in 4 days of RL training
Nous Research released NousCoder 14B, an open source competitive programming model that achieved a 7% jump on LiveCodeBench accuracy in just four days of RL training on 48 NVIDIA B200 GPUs. Training used 24,000 verifiable problems, and the release ships under a full Apache 2 license with training code and a benchmark harness.
NVIDIA Alpha Mayo: open source reasoning self-driving models
NVIDIA announced Alpha Mayo at CES, a family of open source reasoning-based self-driving AI models. The models perform end-to-end autonomous driving with explicit reasoning steps, like identifying jaywalkers and stopping accordingly, demoed in a Mercedes-Benz.
Nemotron Speech ASR: 600M streaming model with 24ms latency
NVIDIA released Nemotron Speech ASR, a 600M parameter open source streaming speech recognition model with 24ms median latency and support for 900 concurrent streams on a single H100. Kwindla Hultman Kramer of Daily.co demoed sub-500ms voice-to-voice latency using a three-model pipeline of Nemotron ASR, Nemotron Nano LLM, and Magpie TTS.
Upstage Solar Open 100B: 102B MoE trained on 19.7T tokens
Upstage released Solar Open 100B, a 102B parameter MoE model with only 12B active parameters per token (129 experts, top-8 activation), trained on 19.7 trillion tokens including 4.5T synthetic via a 'data factory' approach. It outperforms GLM 4.5 Air on many benchmarks, features the SNAP PO reinforcement learning technique with a 50% training speedup, and delivers best-in-class Korean language performance.
Qwen 3 Coder posts insane scores in the race for the coding crown
Alibaba's Qwen 3 Coder landed in July with what the crew called insane benchmark scores for an open-weights coding model. Together with Kimi K2 and GLM 4.5 it made July the peak month for Chinese open source.
Qwen launches speech-to-speech model with emotion handling
Qwen released a speech-to-speech model in March with internal emotion handling, joining the wave of voice-native models. It was part of the Qwen team's relentless 2025 release cadence across modalities.
DeepSeek R1: the open reasoning model that crashed NVIDIA's stock
DeepSeek's open-weights reasoning model dropped January 23rd and matched OpenAI's o1 at roughly 50x cheaper pricing, with an alleged training cost of just $5.5M. It crashed NVIDIA stock 17% — a $560B single-day loss, the largest single-company monetary loss in history — and made Chinese AI a household topic. The crew named it the earthquake that shattered assumptions about who leads AI.
$560B NVIDIA stock loss$5.5M DeepSeek R1 training cost
DeepSeek V3.1 Terminus lands amid September's relentless pace
DeepSeek resurfaced in September with V3.1 Terminus, another strong open-weights release that arrived just as the crew was barely keeping up with the weekly firehose. Nisten noted that missing a single week in this period left you completely lost.
Kokoro TTS: 82M-param Apache 2 model hits #1 on TTS Arena
Kokoro, a tiny 82M parameter text-to-speech model, went viral in January after hitting #1 on TTS Arena. Released under Apache 2.0 and small enough to run in the browser, it showed that high-quality speech synthesis no longer required huge models.
MiniMax released Hailuo 2.3 (referred to as 'Hailuo LLM 2.3' on the show) in November, cited as another strong release from the Chinese labs. It closed out a year in which MiniMax shipped everything from 4M-context LLMs to media models.
MiniMax-01: open model with a 4M token context window
MiniMax (Hailuo) released MiniMax-01 in January with a 4 million token context window, by far the largest context of any open-weights model at the time. It was an early sign of the Chinese-lab open source dominance that defined 2025.
Kimi K2: the Chinese open model that earned mainstream respect
Moonshot AI's Kimi K2 dropped in July and earned serious mainstream recognition, marking peak Chinese-lab dominance of open source. It was named in the show's TL;DR as one of the defining open-weights releases of 2025.
In July, Tencent's Hunyuan team (rendered as 'HO One' in the episode) joined Huawei in entering the open-weights model race. It widened the field of Chinese labs shipping serious open models beyond DeepSeek, Qwen, and Moonshot.
GLM 4.5 runs on Cerebras fast enough to win hackathons
Zhipu's GLM 4.5 came out in July and was the first open model that ran on Cerebras hardware fast enough that hackathon competitors were winning with it. It set up GLM's quiet rise as a business workhorse later in the year.
GLM 4.6 quietly becomes the model businesses actually use
Zhipu's GLM 4.6 arrived in October and, per Nisten, quietly became a go-to model that many businesses still run today. It continued GLM's trajectory from hackathon favorite to production workhorse.
Allen AI's BOLMO reaches byte-level parity with tokenized models
Allen AI released BOLMO, described as the first byte-level language model to reach parity with regular tokenization-based models. The panel framed it as a research breakthrough that could eventually remove tokenizers from the LLM stack.
Allen AI adds video-input multimodal OLMO models in 4B/7B/8B sizes
Allen AI extended its OLMO family with multimodal models that accept video input, released in 4B, 7B, and 8B sizes. It continues Allen AI's fully open approach to model development alongside the BOLMO byte-level work.
FunctionGemma: Google's 270M function-calling model for edge agents
Google released FunctionGemma, a tiny 270M-parameter open model specialized for function calling on-device. With a roughly 500MB RAM footprint and strong gains after fine-tuning for mobile actions, it points toward privacy-first local agents on constrained hardware.
Meta SAM Audio brings promptable source separation to audio
Meta released SAM Audio, an audio source separation model that extends the Segment Anything concept to sound. It supports multimodal prompting via text, visual, and temporal cues to isolate sources from audio, with weights on Hugging Face and code on GitHub.
NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes
NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.
Resemble AI open-sources Chatterbox Turbo, a 350M MIT-licensed TTS
Resemble AI released Chatterbox Turbo, an MIT-licensed 350M-parameter open text-to-speech model. The company claims it beats ElevenLabs in blind listening tests, pushing high-quality TTS into fully open, accessible territory.
Arcee AI introduced Trinity, a family of US-trained open mixture-of-experts models built from scratch, starting with Trinity-Mini and Trinity-Nano-Preview. CTO Lukas Atkins joined the show to discuss the training approach and previewed Trinity-Large for January 2026. The release positions Arcee as a domestic alternative in an open-weights field dominated by Chinese labs.
DeepSeek V3.2 and V3.2-Speciale post gold-medal reasoning under MIT license
DeepSeek released V3.2 and the reasoning-first V3.2-Speciale, a 685B-parameter MoE under MIT license. Speciale posted gold-medal-level olympiad results and 96% on AIME (versus GPT-5 High at 94%), with V3.2 hitting 73.1% on SWE-Bench Verified. Aggressive pricing around 28 cents per 1M tokens on OpenRouter pushes open models closer to top closed-model capability.
96% AIME73.1% SWE-Bench Verified685B Total parameters (MoE)
Microsoft shares VibeVoice-Realtime-0.5B with ~300ms latency TTS
Microsoft published VibeVoice-Realtime-0.5B on Hugging Face, a small realtime text-to-speech model claiming roughly 300ms latency. The show framed it as more evidence that sub-second audio response is becoming table stakes for production voice agents.
Mistral returns to Apache 2.0 with Mistral Large 3 and Ministral 3
Mistral relaunched its model family under permissive Apache 2.0 licensing with Mistral Large 3 and the small Ministral 3 edge models. Large 3 ships a 256K context window and strong open-model coding positioning. The licensing shift reignited discussion around open model portability and deployability.
Nous Research ships Hermes 4.3 36B with decentralized training
Nous Research released Hermes 4.3-36B, highlighted on the show for being trained with decentralized infrastructure and for state-of-the-art RefusalBench performance. The release continues the Hermes line of open, steerable instruction-tuned models.
Tongyi's Z-Image Turbo brings sub-second open image generation
Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.
Black Forest Labs releases FLUX.2, a 32B multi-reference image model
Black Forest Labs released FLUX.2, a 32B-parameter image model with open weights (FLUX.2-dev) that supports multi-reference image editing. It lets users combine multiple reference images and prompt edits with variables, a step up in controllable image editing.
DeepSeek Math V2: 685B open-weights model with IMO gold-level math
DeepSeek surfaced DeepSeek Math V2, a 685B-parameter Apache-2.0 model that reaches IMO gold-level math reasoning. It is the first open-weights math champion at this level, dropped quietly on HuggingFace during the week.
Microsoft ships Fara-7B, a 7B on-device computer use agent
Microsoft Research released Fara-7B, a best-in-class 7B-parameter vision-language model for computer use that runs on-device. It scores 73.5% on WebVoyager, beating OpenAI's computer-use preview while being small enough to run locally.
Prime Intellect releases INTELLECT-3, a 106B open MoE model
Prime Intellect released INTELLECT-3, a 106B-parameter mixture-of-experts model with 12B active parameters that scores 90% on AIME 2024/2025. The lab fully open-sourced the training stack alongside the weights, showing a small lab can train frontier-scale models.
106B Total parameters (12B active)90% AIME 2024/2025
Tencent's 1B HunyuanOCR beats 72B models on OCRBench
Tencent released HunyuanOCR, a 1B-parameter OCR model that scores 860 on OCRBench, beating models as large as Qwen3-VL-72B. It is a striking example of task-specialized small models outperforming generalist giants.
Tencent releases HunyuanVideo 1.5, a lightweight open video model
Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.
OLMo 3: Allen AI's fully open 32B model with complete recipe
Allen AI released OLMo 3, a fully open 32B dense model where the dataset, training recipe, and hyperparameters are all public — not just the weights. LDJ contrasted it with open-weights-only releases from Qwen and DeepSeek, which have never published a fully open recipe.
32B Dense parameters, fully open dataset and recipe
Meta SAM 3: open-vocabulary segmentation and tracking in video
Meta's Segment Anything Model 3 adds open-vocabulary segmentation with text and exemplar prompts, letting you click or type to segment and track any object across images and video. The panel demoed it live on golden retriever videos, and it ships openly as part of Meta's open-source push.
SAM 3D turns single photos into 3D objects and human bodies
Released alongside SAM 3, SAM 3D reconstructs 3D objects and full human bodies from a single image with surprisingly high quality. It extends the Segment Anything family from 2D segmentation into single-image 3D reconstruction.
Baidu open-sources ERNIE-4.5-VL-28B-A3B-Thinking visual reasoning model
Baidu released ERNIE-4.5-VL-28B-A3B-Thinking, an Apache 2.0 open-weights visual reasoning MoE with only 3B active parameters that claims to rival much larger models like GPT-5 High on vision tasks. It features image zooming, spatial grounding, and reasoning, with strong small-model performance attributed to GSPO training from the Qwen team.
H Company open-sources Holo2 multimodal computer-use agent family
Dropped live during the show: H Company open-sourced Holo2, a next-generation multimodal agent family fine-tuned on Qwen3-VL for grounding, navigation, and reasoning across web, desktop, and mobile. It posts SOTA results on computer-use and web-navigation benchmarks like OSWorld-G and ships in 4B, 8B, and 30B variants under Apache 2.0.
Meta releases Omnilingual ASR covering 1,600+ languages
Meta released Omnilingual ASR, an Apache 2.0 speech recognition family supporting over 1,600 languages, including 500+ never before served by any ASR system, with character error rate under 10% for 78 languages. The release includes an open corpus of 500k+ rows of transcribed audio, and the 1B model was praised as a near drop-in state-of-the-art replacement on Hugging Face.
WeiboAI releases VibeThinker-1.5B open reasoning model
Weibo's AI team open-sourced VibeThinker-1.5B, a tiny reasoning model that reportedly outperforms much larger models like DeepSeek R1 on select reasoning benchmarks. Part of a week where small open-weights models from Chinese labs kept punching above their weight.
Ai2 launches OlmoEarth foundation models and open Earth-intelligence platform
Ai2 launched OlmoEarth, a family of foundation models plus an open, end-to-end platform for fast, high-resolution Earth intelligence. It applies the lab's open-model approach to geospatial and remote-sensing data, making Earth observation workloads accessible without proprietary stacks.
Hugging Face publishes the Smol Training Playbook for LLM pretraining
Hugging Face published the Smol Training Playbook, a 200+ page end-to-end guide to reliably pretraining and operating LLMs. It distills the team's practical experience from the SmolLM line into an open resource for anyone training their own models.
Maya-1 open-source voice generation model released
Maya-1 is a new open-source voice generation model that was demoed on the show as part of the week's voice AI wave. The panel highlighted how quickly open voice model quality is improving, with expressive output that holds up against commercial systems.
Meituan releases LongCat Flash Omni, a 560B (27B active) omni model
Meituan's LongCat team released LongCat Flash Omni, a 560B-parameter mixture-of-experts model with roughly 27B active parameters that accepts text, audio, and video input. It extends the open LongCat Flash line into omni-modal territory from a lab better known for food delivery than frontier models.
Moonshot AI releases Kimi K2 Thinking, an open 1T-param reasoning MoE
Moonshot AI released Kimi K2 Thinking, an open-source 1-trillion-parameter mixture-of-experts reasoning agent with 256K context and large-scale tool-calling capacity. The panel treated it as the open-source centerpiece of the week, focusing on its reasoning quality and coding utility rather than just benchmark screenshots, and as a sign open models keep closing the usability gap with frontier closed models.
IBM Granite 4.0 Nano: ultra-efficient tiny models for edge deployment
IBM released Granite 4.0 Nano, a set of ultra-efficient tiny open models aimed at edge deployment. The release continues the trend of capable sub-billion-to-few-billion parameter models that can run locally on constrained hardware.
Ming-flash-omni Preview: sparse MoE omni-modal open model
Ant Group's InclusionAI team released Ming-flash-omni Preview, a sparse mixture-of-experts omni-modal model on Hugging Face. It handles multiple input and output modalities in a single open-weights model, adding to the wave of Chinese open omni-modal releases.
MiniMax M2: open-source agentic model at 8% of Claude's price, 2x speed
MiniMax released M2, an open-source agentic model positioned at roughly 8% of Claude's price while running about twice as fast. Head of Engineering Skyler Miao joined the show for a deep dive, framing M2 as both a model story and a speed story, and the panel read it as part of a broader open-model pressure wave on frontier labs.
8% of Claude's price2x speed vs comparable frontier models
Kimi Linear: 48B open model with linear attention and 1M context
Moonshot AI released Kimi Linear, a 48B parameter (A3B active) instruct model that uses linear attention to reach a 1M token context window. It is an open-weights bet on efficient long-context architectures from the Kimi team.
OpenAI ships GPT-OSS-Safeguard, first open-weight safety reasoning models
OpenAI released GPT-OSS-Safeguard, its first open-weight safety reasoning models, built on the GPT-OSS family. The models let developers apply custom safety policies via reasoning rather than fixed classifiers, extending OpenAI's open-weights push into the trust-and-safety layer.
Qwen3-VL adds compact 2B and 32B multimodal models
Alibaba's Qwen team extended the Qwen3-VL family with newly updated 2B and 32B checkpoints. The 2B is a generic VLM (OCR-capable) that holds up against its 4B and 8B siblings from prior weeks, while the 32B reportedly outperforms GPT-5 mini and Claude 4 Sonnet on benchmarks.
The Allen Institute for AI updated its open OCR line with olmOCR 2 at 7B (released as an FP8 checkpoint), landing in the same week as DeepSeek-OCR, Qwen3-VL, and Liquid's LFM2-VL. Another sign that document understanding became this week's hottest open-model category.
DeepSeek-OCR turns text into compressed vision tokens for massive contexts
DeepSeek open-sourced DeepSeek-OCR, a 3B model (~570M active parameters) that is less an OCR model and more a context-compression breakthrough: it renders text as images, compresses it up to 10x while retaining 97% decoding accuracy (60% even at 20x), and reads it back with a tiny vision decoder. The approach suggests text tokenization is far from optimal and points at vastly cheaper long-context processing; alphaXiv reportedly OCR'd all of arXiv for $1000 versus $7500 with MistralOCR, and a single H100 can process up to 200K pages.
97% decoding accuracy at 10x compression~570M active parameters (3B total)200K pages scannable on a single H100
Krea open-sources a 14B real-time video generation model
Krea AI open-sourced a 14-billion-parameter real-time video model, with weights on Hugging Face. It joins the week's clear trend of generative video racing toward live, interactive experiences rather than offline rendering.
LTX-2: native 4K audio+video generation engine from Lightricks
Lightricks announced LTX-2 as breaking news on the show: a video generation engine producing native 4K video (no upscaling) with synchronized audio, positioned as a fast, efficient open alternative to closed models like Sora. It is billed as open-source with weights coming this fall.
Liquid AI ships LFM2-VL-3B tiny multilingual vision-language model
Liquid AI released LFM2-VL-3B, a tiny multilingual vision-language model, part of a wave of OCR-and-VLM releases this week. It targets efficient on-device and edge vision-language workloads at the 3B scale.
PokeeResearch-7B: open-source SOTA deep research agent model
Pokee AI released PokeeResearch-7B, an open-source 7B deep research agent model claiming state-of-the-art results for its size. Weights, code, a paper, and a hosted deep-research preview all shipped together.
Qwen3-VL adds compact 3B and 8B open vision-language models
Alibaba's Qwen team released smaller Qwen3-VL vision-language models in 3B and 8B sizes, bringing the flagship VL capabilities down to edge- and laptop-friendly scales. Weights are open on Hugging Face as part of the Qwen3-VL collection.
Google's C2S-Scale 27B validates a cancer hypothesis in living cells
Google released C2S-Scale 27B, a Gemma-based single-cell biology model that generated a novel cancer therapy hypothesis later validated in living cells. The show called this a bombshell example of AI contributing to real scientific discovery rather than just benchmarks.
KAIST releases KORMo, a bilingual Korean/English 10B open model
KAIST published KORMo, a 10B parameter fully open bilingual model for Korean and English, with weights on Hugging Face and an accompanying paper. It continues the trend of strong national-language open models coming out of Korean labs.
Qwen3-Omni ships open-weights any-to-any audio, vision, and text
Alongside Qwen3-VL, Alibaba released Qwen3-Omni, an end-to-end omni-modal open-weights model that takes text, image, audio, and video input and can respond with streaming speech. The show treated it as direct evidence of how fast open multimodal systems are improving, with weights on Hugging Face, a GitHub repo, demos, and availability in Qwen Chat and the Model Studio API.
Alibaba's Qwen team shipped Qwen3-VL, its new flagship open-weights vision-language family, headlining the episode's 'Qwen-mas' barrage. The panel discussed it as a practical workflow tool for visual understanding and agentic GUI tasks, not just another model card, with weights, a blog post, and a Hugging Face demo all available at launch.
Wan Animate brings open-weights character animation and replacement
Alibaba's Wan team released Wan 2.2 Animate, an open-weights model that animates a character image from a performance video, replicating motion and expressions, or swaps a character into existing footage. It landed in the episode's closing run of video releases showing multimodal product quality climbing across the board.
DeepSeek V3.1 Terminus refines agents and bilingual output
DeepSeek released V3.1 Terminus, an update to V3.1 with cleaner bilingual output, stronger agentic tool use, and cheaper long-context handling. The open weights are available on Hugging Face, continuing DeepSeek's cadence of iterative open releases.
IBM releases Granite Docling 258M compact document-parsing VLM
IBM published Granite Docling 258M, an ultra-compact open-source vision-language model for document understanding that converts documents into structured output. At just 258M parameters it reinforced the show's point that tiny specialized models are becoming genuinely useful workflow tools.
Liquid AI ships Liquid Nanos, tiny task-specific on-device models
Liquid AI released Liquid Nanos, a family of very small task-specific models built for jobs like extraction, translation, RAG, and tool calling that can run on-device. The collection landed on Hugging Face, fitting the episode's theme of small-but-capable models powering real products.
Meta releases 32B Code World Model for agentic code reasoning
Meta released CWM, a 32B open-weights research model trained to internally model code execution, aimed at agentic code reasoning rather than plain code completion. The weights are on Hugging Face under facebook/cwm, giving the open-source community a new approach to code world modeling.
Moondream 3 preview punches above its weight in the tiny-VLM race
Moondream released a preview of Moondream 3, a small open vision-language model that punches well above its size class. CTO and co-founder Vik Korrapati joined the show to explain why small, capable vision models matter for real product building, framing Moondream 3 as a practical tool rather than a benchmark flex.
Tongyi DeepResearch: open-source A3B web agent rivals OpenAI Deep Research
Alibaba's Tongyi Lab open-sourced Tongyi DeepResearch, a 30B mixture-of-experts web research agent with only 3B active parameters. The lab claims parity with OpenAI's Deep Research on agentic search and report-writing tasks, and the weights are available on Hugging Face.
HuMo: human-centric multimodal video generation from ByteDance/Tsinghua
ByteDance research and Tsinghua released HuMo, a human-centric video generation model that conditions on multimodal inputs (text, image, and audio) to produce videos of people. The weights are available on Hugging Face.
Mistral updates its open reasoning model with Magistral-Small-2509
Mistral published Magistral-Small-2509, an updated checkpoint of its small open-weights reasoning model. The refresh keeps Mistral's open reasoning line current as the open-model competitive baseline moves quickly.
Moondream 3 Preview: 9B MoE VLM with 2B active parameters
Moondream released a preview of Moondream 3, a 9B mixture-of-experts vision-language model with only 2B active parameters. It targets frontier-level visual reasoning at small-model cost, continuing Moondream's run of efficient open vision models.
Perceptron AI introduces Isaac 0.1, a 2B perceptive-language model
Perceptron AI released Isaac 0.1, a 2B parameter perceptive-language model with open weights on Hugging Face. Despite its small size, the show notes highlight that it 'points better than GPT', excelling at visual grounding and pointing tasks relative to much larger models.
Alibaba's Tongyi Lab open-sources WebWatcher vision-language research agent
Alibaba's Tongyi Lab open-sourced WebWatcher, a vision-language deep research agent that sets new state-of-the-art results on agentic browsing and research tasks. The 32B model combines visual understanding with web research capabilities and is available on Hugging Face.
Apple's FastVLM-7B lands with a speed-first vision encoder, 85x faster TTFT
Apple released FastVLM-7B, a vision-language model built around a speed-first vision encoder that delivers up to 85x faster time-to-first-token than peer VLMs. Quantized variants (7B-int4, 1.5B-int8) on Hugging Face make it practical for on-device and real-time vision use, anchoring the show's fast-VLM discussion.
Google releases EmbeddingGemma, a 300M-param SOTA embedding model for RAG
Google released EmbeddingGemma, a 300M-parameter open embedding model that achieves state-of-the-art results for its size, aimed at RAG and on-device semantic search. It dropped as breaking news during the show, with browser-based demos like Semantic Galaxy showing it running fully client-side.
Nous Research releases Hermes 4 14B compact hybrid reasoning model
Nous Research launched Hermes 4 at 14B, a compact hybrid reasoning model with tool calling designed for both local and cloud use. It extends the Hermes 4 family down to a size practical for local deployment while keeping reasoning and tool-use capabilities, with a full tech report published on arXiv.
Switzerland launches Apertus-8B and 70B, fully open multilingual LLMs
The Swiss AI Initiative launched Apertus-8B and Apertus-70B, fully open multilingual LLMs trained on 15T tokens covering more than 1,800 languages. The release stands out for full openness (weights, data recipe, and training transparency) and unusually broad language coverage from a national effort.
Tencent open-sources Hunyuan-MT-7B translation model after sweeping WMT2025
Tencent open-sourced Hunyuan-MT-7B, a 7B-parameter machine translation model, after it swept the WMT2025 translation competition. It gives the open-weights community a small, focused translation model that punches well above its size class.
DeepSWE-Preview hits 59% SWE-Bench Verified with pure RL on Qwen3-32B
Agentica and collaborators (with guest Michael Luo of UC Berkeley) released DeepSWE-Preview, a fully open-sourced RL-trained coding agent built on Qwen3-32B that reached 59% on SWE-Bench Verified, a top open result in a benchmark dominated by closed systems. The team published training methodology and weights, emphasizing reproducible reward design and verification over sealed benchmark numbers.
Baidu open-sources ERNIE 4.5, a 10-model multimodal family
Baidu open-sourced the ERNIE 4.5 series, a family of 10 models ranging from 424B down to 0.3B parameters with multimodal capabilities, reportedly beating o1 on DocVQA. The release marks a sharp reversal from Baidu's previous anti-open-source posture and another sign that Chinese labs are setting the pace in open source.
Huawei's Pangu Pro MoE: 72B model trained entirely on Ascend NPUs
Huawei released Pangu Pro, a 72B-parameter MoE trained on its own Ascend NPUs rather than Nvidia or AMD hardware, hitting 1,528 tokens/sec and pretrained on 13T tokens. The panel framed it as the geopolitical open-model story of the week, showing how far Chinese compute stacks have advanced under sanctions.
Kyutai releases open low-latency TTS for English and French
Kyutai Labs released an open 1.6B-parameter text-to-speech model with low latency and high voice similarity in English and French. It was one of two TTS launches closing out the episode, underscoring how quickly multimodal product quality is rising.
Tencent ships Hunyuan-A13B: 80B MoE with only 13B active params
Tencent released Hunyuan-A13B-Instruct, an 80B-parameter MoE that activates only 13B parameters at inference while keeping a 256K context window. Built by the team with WizardLM lineage, it posts strong reasoning benchmarks and feels unusually practical for its class, though the panel flagged its license limits.
DeepSeek drops R1-0528, an updated open reasoning model with big gains
DeepSeek released R1-0528 out of nowhere, an update to their open-weights reasoning model with serious performance jumps: AIME 91, LiveCodeBench 73, and SWE-bench Verified 57.6. They also shipped an 8B distilled version based on Qwen3 that can run on a laptop, keeping it among the best open-weight models available.
91 AIME score, beating previous R1 by a mile8B Distilled Qwen3-based version runnable on a laptop
Haize Labs releases j1-nano and j1-micro tiny reward models
Haize Labs shipped j1-nano (600M params) and j1-micro (1.7B params), tiny open reward models for judging LLM outputs. Despite their small size, j1-micro scores 80.7% on RewardBench, making capable reward modeling accessible on modest hardware.
Resemble AI open-sources Chatterbox voice cloning with emotion control
Resemble AI released Chatterbox, an open-source voice cloning model with emotion control. Weights and code are public on GitHub and Hugging Face, bringing controllable, expressive voice cloning to the open ecosystem.
AM-Thinking v1: 32B dense reasoning model beats bigger MoEs at math and code
A 32B dense open-weights reasoning LLM from a new Chinese team that takes on much larger mixture-of-experts models and comes out on top for math and code, hitting 85.3% on AIME 2024, 70.3% on LiveCodeBench v5, and 92.5% on Arena-Hard. It supports a /think reasoning toggle, ships with a permissive license, is tooled for vLLM, LM Studio, and Ollama, and runs at 25 tokens/sec on a single 80GB GPU with INT4 quantization. A multilingual RLHF pass and 128k context window are in the works.
32B dense parameters85.3% AIME 202425 tokens/sec on a single 80GB GPU with INT4
Alibaba's Wan 2.1: open-source diffusion-transformer text-to-video suite
Alibaba, the team behind the Qwen LLMs, released Wan 2.1, a full stack of open-source diffusion-transformer text-to-video foundation models. Amid the show's discussion of video-model fatigue, this was called out as a release that cuts through the noise, with weights on Hugging Face and code on GitHub.
Nous Research launches Psyche, a decentralized cooperative-training network
Psyche is Nous Research's decentralized cooperative-training network that lets distributed participants jointly train large models over the internet. The launch includes open code on GitHub and a live dashboard tracking the first run, a 40B model called Consilience. COO Dillon Rolnick joined the show to explain the decentralized training push.
Stability AI and Arm release Stable Audio Open Small for on-device audio
Stability AI, together with Arm, released Stable Audio Open Small, a 341M-parameter open text-to-audio model built for real-world on-device deployment. The show framed it as part of a small comeback for Stability, with weights on Hugging Face and an accompanying paper.
StepFun's Step1X-3D: open two-stage framework for textured 3D assets
StepFun released Step1X-3D, an open two-stage framework for high-fidelity, controllable generation of textured 3D assets: it first synthesizes watertight geometry, then generates view-consistent textures. Trained on 2M curated meshes, the release also includes a curated dataset of 800K assets and a Hugging Face demo.
Falcon-Edge: ternary BitNet LLMs for edge deployment under 1GB VRAM
TII's Falcon-Edge project releases ternary BitNet LLMs (1B and 3B base models) that slash memory and compute requirements, enabling inference on less than 1GB of VRAM. Fine-tuners get pre-quantized checkpoints and a clear path to 1-bit LLMs.
Alongside the Qwen 3 launch, Alibaba updated its Qwen 2.5 Omni multimodal model line. Mentioned briefly in the open-source roundup as part of the week's Qwen ecosystem push.
Alibaba open-weights the full Qwen 3 family under Apache 2.0
Alibaba released the entire Qwen 3 stack: two MoE models (235B total/22B active and 30B/3B active) plus six dense siblings from 32B down to 0.6B, all Apache 2.0 with day-one support in LM Studio, Ollama, vLLM, MLX and llama.cpp. The headline feature is a runtime hybrid 'thinking' toggle (/think and /no_think) that trades latency for reasoning depth. Trained on ~36T tokens with 128K context and 119-language coverage, the 235B MoE rivals DeepSeek-R1, o1, o3-mini and Gemini 2.5 Pro on coding and math.
235 B Flagship MoE total parameters (22B active)30 B Qwen3-30B-A3B hit 57 tok/s on a Mac with speculative decoding36 Trillions of pre-training tokens (2x Qwen 2.5)
HiDream E1: open-weights image model with standout Ghibli style
HiDream released E1, an open-weights image editing/generation model (Apache 2.0-style licensing) noted for beautiful Ghibli-style outputs. It ranks #4 on the Artificial Analysis image arena leaderboard, sitting among top contenders like Google Imagen and ReCraft.
JetBrains open-sources Mellum-4b, its code completion focal model
JetBrains published Mellum-4b-base on Hugging Face, a 4B-parameter model specialized for code completion that powers its IDE AI features. Listed in the episode's open-source links roundup.
Kyutai releases Helium-1, a 2B European-language model plus dactory pipeline
Kyutai released Helium-1, a 2B-parameter model distilled from Gemma-2-9B and purpose-built for Europe's 24 official languages, under CC-BY 4.0. It sets a new state of the art for its size class on MMLU-EU, ARC-EU and FLORES translation while fitting in under 2GB VRAM for edge and phone deployment. They also open-sourced 'dactory' (MIT), their full Common Crawl data-processing pipeline that scores, dedups and tags webpages.
Meta's LlamaCon security drop included Llama Guard 4 (text + image protection), Llama Firewall (stops prompt hacks and risky code), Prompt Guard 2 (faster jailbreak defense), CyberSecEval 4, and a new Defender Program for security researchers.
Microsoft ships Phi-4-reasoning and Phi-4-reasoning-plus (14B, MIT)
Microsoft fine-tuned the 14B Phi-4 on 1.4M curated chain-of-thought traces (SFT) and added a small RL stage (Plus variant) to create two MIT-licensed reasoning models. They punch far above their weight: Phi-4-reasoning-plus outperforms DeepSeek-R1-Distill-70B on AIME 25 (78% vs 51%) and sits within a few points of the full 671B DeepSeek-R1, while running on a single GPU with explicit <think> scaffolding.
OpenPipe's ART·E: RL-trained open email agent that beats o3
OpenPipe released ART·E, an Apache 2.0 email research agent built on a 14B Qwen 2.5 backbone, trained on 500K Enron emails plus synthetic Q&A and refined with reinforcement learning. It tops o3 on accuracy (96% vs 90%) while running 5x faster (1.1s median) and 64x cheaper ($0.85 per 1,000 queries), using a simple three-tool loop.
Xiaomi enters open weights with MiMo-7B, MIT-licensed reasoning family
Xiaomi's first open-weights release is a 7B dense family (Base, SFT, RL, RL-Zero) trained from scratch on 25T tokens with a multi-token-prediction objective and rule-verifiable reinforcement learning. The RL variant matches OpenAI o1-mini on benchmark suites despite being far smaller, scoring 55.4% on AIME 2025 and 49.3% on LiveCodeBench v6, all under an MIT license with vLLM-ready weights.
Pipecat releases Smart-Turn, an open source semantic VAD model
The Pipecat team (from Daily) released Smart-Turn, an open source semantic voice activity detection model that understands when a speaker has actually finished their turn rather than just detecting silence. Kwindla Kramer joined the show to break down how semantic VAD makes voice agent conversations feel far more natural, with a community training effort at turn-training.pipecat.ai.
Google ships Quantization-Aware Trained Gemma 3 models for consumer GPUs
Google released Quantization-Aware Training (QAT) versions of the Gemma 3 family, dramatically cutting memory requirements while preserving quality. The 27B model drops from a hefty 54GB to just 14.1GB, and even the 1B model goes from 2GB to about half a gig, making state-of-the-art open models runnable on consumer GPUs. Wolfram took the 4B QAT model for a spin in LM Studio on the show.
27B Gemma 3 27B QAT: 54GB down to 14.1GB1B Gemma 3 1B QAT: 2GB down to ~0.5GB4B 4B QAT model tested in LM Studio
Dex Horthy publishes 12-Factor Agents, a guide to production-ready agents
HumanLayer founder Dex Horthy published 12-Factor Agents, an open GitHub repo and essay distilling common patterns and pitfalls for building reliable, production-ready AI agents. Drawing on his experience building agent SDKs, it argues that serious teams end up writing large parts from scratch and lays out principles for robust agent design, discussed in depth on the show.
FramePack generates 120-second videos on just 6GB of VRAM
FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.
Nari Labs' Dia: a wild 1.6B open source TTS model that blew up Twitter
Nari Labs released Dia, a 1.6B parameter open-weights text-to-speech model that absolutely blew up Twitter with its expressive, emotional dialogue generation, including laughs, coughs, and multi-speaker conversations. Built by a tiny team, it punches far above its weight against commercial TTS systems and supports voice cloning, with demos available on Fal.ai.
NVIDIA releases DAM-3B for region-based image and video captioning
NVIDIA dropped the Describe Anything Model (DAM-3B), a 3 billion parameter multimodal model for region-based image and video captioning. You can point it at a specific region of an image or video and it generates a detailed description of just that area. NVIDIA also published an accompanying DescribeAnything dataset and a Hugging Face demo.
Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model
Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.
Microsoft releases BitNet 1.58-bit model weights on Hugging Face
Microsoft published BitNet (listed in the show notes as BitNet v1.5), its native 1.58-bit quantized LLM, as open weights on Hugging Face. The ternary-weight approach targets extremely efficient CPU inference at a fraction of the memory of standard models.
OpenAI debuts Codex CLI, an open source terminal coding agent
OpenAI released Codex CLI, an open source coding tool for the terminal. It ships with hardened security, using Apple Seatbelt on macOS to limit execution to the current directory plus temp files.
Prime Intellect launches INTELLECT-2, a 32B globally-distributed RL run
Prime Intellect released INTELLECT-2, a 32B reasoning model trained with globally decentralized reinforcement learning, a follow-up to the INTELLECT-1 decentralized pretraining run covered on the show in December. The release includes open weights on Hugging Face, a tech report, and the PRIME-RL training code.
Z.ai (formerly chatGLM) releases the GLM-4-0414 open-source family
Z.ai, the rebranded Zhipu AI / chatGLM team, released the GLM-4-0414 family of open-source models. The drop includes base, reasoning and rumination variants published on Hugging Face and GitHub.
Deep Cogito debuts Cogito v1 Preview models from 3B to 70B, beating DeepSeek 70B
New lab Deep Cogito released the Cogito v1 Preview family of open models ranging from 3B to 70B parameters, claiming SOTA results at each size and beating DeepSeek's 70B distill. The models are available on Hugging Face, giving local AI enthusiasts the small-to-mid sizes Llama 4 skipped.
GitMCP turns any GitHub repo into an MCP server instantly
Creators Liad Yosef and Ido Salomon launched GitMCP, a free tool that turns any GitHub repository into an MCP server by simply swapping the domain (gitmcp.io/user/repo). It lets AI assistants ground themselves in a repo's docs and code, and the creators joined the show to demo it.
Google announces A2A, an open agent-to-agent communication protocol
Google announced the Agent2Agent (A2A) protocol at Cloud Next, an open spec for agents from different vendors to discover and communicate with each other. The spec was published on GitHub with a long list of launch partners, including Weights & Biases.
HiDream-I1-Dev: 17B MIT-licensed image model surpasses Flux 1.1 [pro]
HiDream released HiDream-I1-Dev, a 17B parameter open-weights image generation model under an MIT license. It became the new leading open-weights image generator, surpassing Flux 1.1 [pro] on quality benchmarks.
Jina Reranker M0: SOTA multilingual, multimodal document reranker
Jina AI released Jina Reranker M0, a state-of-the-art multimodal and multilingual document reranker model. It reranks documents that include both text and images, targeting retrieval and RAG pipelines, with weights available on Hugging Face.
Meta drops Llama 4 Scout (109B) and Maverick (400B) open-weights MoE models
Meta released the long-awaited Llama 4 family in a chaotic Saturday drop: Scout (17B active / ~109B total, 16 experts) and Maverick (17B active / ~400B total, 128 experts), with a 2T-parameter Behemoth still in training. The models are multimodal, multilingual MoE architectures trained on ~30T tokens with FP8 and interleaved attention (iRoPE), claiming 10M context for Scout and 1M for Maverick. The release was marred by drama: the LMArena version differed from the released model, and the community criticized the lack of small local-friendly sizes.
10M Stated context window for Llama 4 Scout288B Active parameters of unreleased Behemoth (2T total)17B Active parameters for both Scout and Maverick
Moonshot drops Kimi-VL and Kimi-VL-Thinking, tiny A3B open vision models
Moonshot AI released Kimi-VL and Kimi-VL-Thinking, compact vision-language models with only ~3B active parameters (A3B MoE). The thinking variant adds reasoning to a tiny VLM, and both are available openly on Hugging Face.
NVIDIA ships Nemotron Ultra, a 253B pruned and distilled Llama 3.1-405B
NVIDIA released Nemotron Ultra, a pruned and distilled finetune of Llama 3.1-405B at roughly half the parameters (253B). Its benchmarks even included Llama 4 comparisons, showing the older finetuned Llama beating the new models on AIME, GPQA and more. It supports 128K context and fits on a single 8xH100 node for inference.
253B Parameters (pruned from Llama 3.1-405B)128K Context window
DeepCoder-14B: open RL-finetuned coder beats DeepSeek R1 and o3-mini on coding
Together AI and Agentica (UC Berkeley Sky Computing Lab) released DeepCoder-14B-Preview, a reasoning model finetuned with RL that beats DeepSeek R1 and even o3-mini on several coding benchmarks. The project aims to democratize RL: the team open-sourced the model, the training dataset, the Weights & Biases logs, and the eval logs. Guest Michael Luo from Agentica joined the show to discuss the release.
OpenHands LM 32B: MIT-licensed coding agent model hits 37.2% SWE-Bench
All Hands AI (formerly OpenDevin) released OpenHands LM 32B, an MIT-licensed Qwen finetune that scores 37.2% on SWE-Bench Verified, competing with much larger models on real-world repo tasks. The OpenHands agent also took the #2 spot on the new Live SWE-Bench leaderboard, and the 32B model runs locally on a single RTX 3090. A hosted OpenHands Cloud version is also available; guest Xingyao Wang joined the show to discuss it.
37.2% SWE-Bench Verified score#2 Live SWE-Bench leaderboard (OpenHands agent)
Nomic Embed Multimodal: SOTA embeddings for visual documents
Nomic AI released Nomic Embed Multimodal, new 3B and 7B parameter embedding models built on Alibaba's Qwen2.5-VL. They achieve SOTA on visual document retrieval by embedding interleaved text-image sequences, ideal for PDFs and complex webpages. The 7B model ships under Apache 2.0 with open weights, code, and data; guest Zach Nussbaum discussed the release on the show.
Qwen launches Omni 7B: sees, hears, reads, and talks back
Qwen released Qwen2.5-Omni-7B, an open-weights omni-modal model that perceives text, images, audio, and video, and generates both text and speech. It packs end-to-end multimodal perception and spoken output into a 7B parameter model available on Hugging Face.
DeepSeek silently drops V3-0324, 685B params under MIT license
DeepSeek silently updated their V3 base model with DeepSeek-V3-0324, a 685B parameter MoE released on Hugging Face under the MIT license. This is not R1 (their reasoning model) but the powerful base model R1 was built on, and supposedly the base for a future R2.
Prince Canuma releases MLX-Audio v0.0.3 for speech on Apple Silicon
Prince Canuma, creator of MLX-VLM, FastMLX, and MLX Embeddings, released MLX-Audio v0.0.3, an open-source library bringing speech and audio models to Apple Silicon via MLX. It makes powerful open-source TTS and audio models accessible locally on Mac hardware.
Canopy Labs drops Orpheus 3B natural-sounding speech model
Canopy Labs released Orpheus, an open speech language model that produces natural, human-sounding speech, headlined by a 3B model with smaller variants (1B, 500M, 150M) in the family. Weights are on Hugging Face with a Colab for trying it out, discussed on the show with Daily.co CEO Kwindla Kramer in the voice AI segment.
LG open sources EXAONE and EXAONE Deep 32B reasoning model
LG AI Research open sourced its EXAONE family, headlined by EXAONE Deep 32B, a thinking/reasoning model. The release puts a large Korean lab's reasoning model in open weights on Hugging Face, and Alex published a live reaction video to the launch.
Mistral Small 3.1 24B: open-weights multimodal model
Mistral released Mistral Small 3.1, a 24B-parameter open-weights model that adds multimodal (vision) capabilities to the Small line. Both instruct and base checkpoints were published on Hugging Face, making it a strong local multimodal option at the 24B size class.
NVIDIA Canary Flash: Apache 2 speech recognition and translation
NVIDIA released Canary 1B Flash and 180M Flash, Apache 2.0 licensed speech recognition and translation models built as Llama finetunes. The permissive license makes them freely usable for commercial ASR and translation workloads.
NVIDIA drops Llama-Nemotron reasoning models plus training dataset
NVIDIA released the Llama-Nemotron family, including Super 49B and Nano 8B reasoning models, announced around GTC. Alongside the open weights, NVIDIA published the Llama-Nemotron post-training dataset, giving the community both the models and the data recipe behind them.
Roboflow drops RF-DETR, a SOTA open-source object detection model
Roboflow released RF-DETR, a state-of-the-art real-time object detection model, announced as breaking news on the show by CEO Joseph Nelson. The model is fully open source on GitHub and targets practical, deployable computer vision workloads.
StepFun releases Step-Video-TI2V image-to-video model
Chinese lab StepFun dropped Step-Video-TI2V, an open text/image-to-video generation model. Weights are on Hugging Face with code on GitHub, adding another open-weights option to the fast-moving video generation space.
Tencent updates Hunyuan3D 2.0 with MultiView and Turbo variants
Tencent updated its Hunyuan3D 2.0 image-to-3D model with an MV (MultiView) version that conditions on multiple input views, plus a faster Turbo variant. The show highlighted it as new SOTA for 3D generation, available to try in a Hugging Face space.
AllenAI ships OLMo 2 32B, a fully open GPT-4-class model
The Allen Institute for AI released OLMo 2 32B, its biggest fully open model yet, with weights, code, and dataset all published under Apache 2.0. Announced by Nathan Lambert as a last-second addition, it reportedly beats GPT-3.5 and GPT-4o mini as well as leading open-weight models like Qwen and Mistral at its size.
Cohere Command A: 111B enterprise model with 256K context on just 2 GPUs
Cohere announced Command A, a 111B parameter open-weights model with a 256K context window, presented on the show by Cohere's Sandra Kublik. It runs on only two GPUs where models of this size typically require around 32, and is built for enterprise use: agentic tasks, tool use, multilingual performance, and secure private deployments.
EuroBERT: multilingual encoder models from 210M to 2.1B parameters
EuroBERT is a new family of multilingual encoder models ranging from 210M to 2.1B parameters, trained on a 5 trillion-token dataset across 15 languages with 8K context support. It targets European and global language NLP tasks like retrieval and RAG, where properly encoding non-English character sets matters.
Google open sources Gemma 3, 1B-27B multimodal family with 128K context
Google released Gemma 3, an open-weights model family spanning 1B to 27B parameters with multimodal (text, image, video) capabilities, support for over 140 languages, and a 128K context window. The 27B model runs on a single GPU, with Sundar Pichai claiming competitors need roughly 10x the compute for similar performance. It shipped with day-one open source ecosystem support (Hugging Face, Ollama, Kaggle) plus ShieldGemma 2 for content moderation.
OpenSora 2.0: 11B open-source video model trained for $200K
OpenSora 2.0 is an 11B parameter open-source video generation model that claims state-of-the-art results while costing only about $200,000 to train. The team claims performance approaching OpenAI's Sora on some benchmarks, underscoring how fast open-source video generation is improving.
Nous Research releases DeepHermes 24B and 3B hybrid reasoning models
Nous Research released DeepHermes hybrid reasoners at 24B (Mistral-based) and 3B sizes, models that can toggle between standard chat responses and long chain-of-thought reasoning. The 24B preview is available on Hugging Face as part of the week's wave of open-source reasoning model releases.
Reka Flash 3: 21B open-source reasoning model under Apache 2.0
Reka AI open sourced Reka Flash 3, a 21B parameter reasoning model released under an Apache 2.0 license and trained with the REINFORCE Leave One-Out (RLOO) reinforcement learning technique. It excels at chat, coding, instruction following, and function calling, with Nisten calling it possibly one of the best ~20B models available.
Remade AI releases 8 open LoRA video effects for Wan 2.1
Remade AI published eight LoRA video effects for Alibaba's Wan 2.1 14B image-to-video model, including effects like squish, inflate, deflate, and cakeify. The open release shows video effects becoming trainable and customizable via LoRAs on top of open video models.
AI21 releases Jamba 1.6 Large and Jamba 1.6 Mini open-weights models
AI21 Labs released Jamba 1.6 in Large and Mini sizes, updating its hybrid SSM-Transformer (Mamba-based) model family with open weights on Hugging Face. The Jamba architecture targets long-context efficiency compared to pure transformer models.
Qwen releases QwQ-32B reasoning model that matches R1 on some evals
Alibaba's Qwen team released QwQ-32B, an open-weights reasoning model that matches DeepSeek R1 on several evals despite being roughly 20x smaller at 32B parameters. Qwen tech lead Junyang Lin joined the show to announce it, and the episode dubbed it Alibaba's 'R1 killer' for bringing strong reasoning to a size that runs on consumer hardware.
Cohere For AI releases Aya Vision 8B and 32B open multilingual vision models
Cohere For AI released Aya Vision in 8B and 32B sizes, extending the multilingual Aya family with open-weights vision-language capabilities. The models target multilingual multimodal understanding across many languages.
NotaGen open symbolic music model generates classical sheet music
NotaGen is an open symbolic music generation model that produces high-quality classical sheet music rather than raw audio. The release includes code on GitHub, weights on Hugging Face, and a browser demo.
Tencent releases HunyuanVideo-I2V open image-to-video model
Tencent finally shipped the long-awaited image-to-video version of HunyuanVideo, with open weights on Hugging Face and a hosted try-it experience. It lets users animate still images using one of the strongest open video generation models.
Zhipu AI open-sources CogView 4, a 6B text-to-image model
Zhipu AI released CogView 4, a 6B-parameter open text-to-image model in the CogView family, with code available on GitHub. It is notable as an open-weights image generation option with strong Chinese and English prompt support.
DeepSeek open-sources its infra stack during Open Source Week
DeepSeek ran its Open Source Week, releasing a series of production infrastructure repos (including FlashMLA, DeepEP, and DeepGEMM) that power its training and inference stack. The drops gave the open-source community a rare look at the low-level kernels and communication libraries behind DeepSeek's efficient frontier models.
Microsoft releases Phi-4-multimodal and Phi-4-mini open weights
Microsoft expanded the Phi family with Phi-4-multimodal-instruct, a small open-weights model that handles text, vision, and audio in a single model, alongside a compact Phi-4-mini. The weights shipped on Hugging Face, continuing Microsoft's push for capable small models that can run on-device.
Arc Institute and NVIDIA release Evo 2, a 40B state-of-the-art genomics model
Arc Institute and NVIDIA introduced Evo 2, a state-of-the-art genomics model with around 40 billion parameters trained on 9.3 trillion nucleotides. It uses the StripedHyena architecture to process genetic sequences up to 1 million nucleotides, enabling prediction of genetic mutation effects and even design of entire genomes. Fully open: two papers, weights, data, and training and inference codebases.
Haize Labs open-sources Verdict, a framework for composing LLM judges
Haize Labs released Verdict, an open-source framework for composing LLM judges that tackles core LLM-as-a-judge problems: self-preference bias, prompt sensitivity, and meta-evaluation. Verdict combines simpler judging primitives into more robust and efficient evaluators ('judge-time compute scaling'), achieving near state-of-the-art results on benchmarks like ExpertQA at a fraction of the cost, fast enough to use as a real-time guardrail. Co-founders Leonard Tang and Nimit joined the show to discuss it.
Hao AI Lab's FastVideo makes HunyuanVideo 3x faster with no extra training
Hao AI Lab released FastVideo, a method that makes HunyuanVideo (HY-Video) three times faster with no additional training, using a technique called Sliding Tile Attention that outperforms even flash attention for this workload. Faster inference makes open-source video models far more practical, and it supports HY-Video LoRAs for fine-tuned applications.
Hugging Face publishes the Ultra Scale Playbook for training on GPU clusters
Hugging Face released the Ultra Scale Playbook, a guide to building and scaling AI models on large GPU clusters. The team ran 4,000 scaling experiments on up to 512 GPUs to distill practical guidance for labs training big models.
Perplexity releases R1-1776, a censorship-free DeepSeek R1 fine-tune
Perplexity open-sourced R1-1776, a fine-tuned version of DeepSeek R1 designed to remove Chinese government censorship on topics like Tiananmen Square and Taiwanese independence. They used human experts to identify around 300 sensitive topics and built a censorship classifier to train the bias out, claiming no significant impact on standard eval performance. The name 1776 is a nod to American independence.
StepFun open-sources Step-Video-T2V, a SOTA 30B text-to-video model
StepFun released Step-Video-T2V (plus a T2V Turbo variant), a 30 billion parameter state-of-the-art text-to-video model under an MIT license. Results impressed especially on text integration, such as rendering 'We will open source' on a scroll as a character unfurls it, marking one of the strongest open-source video drops of the week.
Alibaba ships Qwen2.5-VL open vision-language model family
Alibaba's Qwen team released Qwen2.5-VL, open-weights vision-language models up to 72B that handle images, documents, video understanding, and on-screen agentic grounding. The 72B Instruct model was immediately available on Hugging Face and in Qwen Chat.
Allen Institute releases Tulu 3 405B open post-trained model
The Allen Institute for AI scaled its fully open Tulu 3 post-training recipe to a 405B-parameter model based on Llama 3.1 405B. It demonstrates that Ai2's open RLVR post-training pipeline works at frontier scale, with weights and recipe released openly.
Block open-sources Goose, a local AI agent framework
Block (the company behind Square) released Goose, an open-source local agent framework that runs on your machine and can use any LLM to execute tasks with tools. It was a centerpiece of the show's agents discussion as an open alternative for building autonomous workflows locally.
Browser-use: open-source alternative to OpenAI's Operator
Browser-use is an open-source library that lets LLM agents control a real web browser, positioned on the show as the OSS counterpart to OpenAI's Operator. It enables anyone to build browsing agents with their model of choice instead of a closed hosted product.
DeepSeek Janus Pro: open multimodal models in 1.5B and 7B
Amid the R1 frenzy, DeepSeek also released Janus Pro, unified multimodal models at 1.5B and 7B parameters that handle both image understanding and image generation. The open release added to DeepSeek's week of dominating AI news headlines.
YuE 7B: open-source Suno-style music generation model
The Multimodal Art Projection (M-A-P) team released YuE, a 7B open-source music generation model dubbed the 'open Suno' on the show, capable of generating full songs with vocals from lyrics. Weights are on Hugging Face with code on GitHub and a hosted demo on fal.ai.
Mistral Small 2501: 24B open-weights model under Apache 2.0
Mistral AI released Mistral Small 2501, a 24B-parameter instruct model under the permissive Apache 2.0 license. Announced as breaking news during the show, it continues Mistral's tradition of strong small open models suitable for fine-tuning and local deployment.
NVIDIA releases Eagle 2 open vision-language models
NVIDIA published Eagle 2, a family of open vision-language models with an accompanying paper, model weights on Hugging Face, and a live demo. It is a fully transparent VLM release covering training data strategy and recipes, competitive with much larger vision models.
Open Thoughts releases OpenThoughts-114k reasoning dataset
An open reasoning dataset with 114k examples released by the Open Thoughts project to fuel open replication of reasoning models like DeepSeek R1. It gives the open-source community high-quality chain-of-thought training data for distilling and fine-tuning reasoning LLMs.
Berkeley TinyZero and RAGEN replicate DeepSeek R1-Zero
Berkeley researchers released TinyZero and RAGEN, open replications of DeepSeek's R1-Zero reinforcement-learning recipe on small models. The projects showed that R1-style emergent reasoning behavior can be reproduced cheaply, with training runs logged publicly on Weights & Biases.
ByteDance UI-TARS: open computer-use models that control your PC
ByteDance released UI-TARS, open computer-use models in 7B and 72B parameter sizes that can control a Mac or PC, with desktop apps for both platforms. ByteDance claims they beat GPT-4-class models on GUI/computer-control benchmarks.
DeepSeek R1: MIT-licensed open source reasoning model rivals o1
DeepSeek released R1, a state-of-the-art open source reasoning model under a permissive MIT license. It matches or beats OpenAI's o1 on key reasoning benchmarks while being fully open weights, and DeepSeek also shipped a family of distilled smaller models. The show called this the hottest week open source AI has ever had.
Hugging Face SmolVLM: tiny vision-language models run on WebGPU
Hugging Face released SmolVLM, a family of tiny vision-language models including a 256M-parameter version small enough to run entirely in the browser via WebGPU. It demonstrates how far efficient multimodal models have shrunk while remaining usable.
Tencent Hunyuan3D 2.0: SOTA open source 3D generation
Tencent released Hunyuan3D 2.0, a state-of-the-art open source 3D asset generation model on Hugging Face. It produces high-quality 3D shapes and textures and pushes open weights forward in the 3D generation category.