New ModelsOpen weights
Z-Image
Tongyi Lab releases Z-Image generation model
Alibaba's Tongyi Lab released Z-Image, a new image generation model, with support landing in the open-source DiffSynth-Studio toolkit on GitHub. Covered in the AI Art segment alongside HunyuanImage 3.0.
New ModelsOpen weights
Trinity Large
Arcee AI ships Trinity Large: 400B MOE trained in 33 days for $20M
Arcee AI's Trinity Large is a 400B-parameter MOE with 13B active parameters, trained on 17T tokens across 2000 B300 GPUs in 33 days for $20M. It has 512K native context (twice Kimi K2.5), is free on OpenRouter until February 2026, and the panel called it the largest Western open-source lab model.
400B Arcee Trinity Large512K Trinity native context
New Models
Lucy 2.0
Lucy 2.0 real-time video generation model
Lucy 2.0, a real-time video generation model, was discussed in the AI Art segment. The episode covered its real-time video capabilities.
New Models
Genie 3 (Project Genie)
Google DeepMind launches Project Genie 3, real-time 24fps world model
Google DeepMind's Genie 3 generates interactive, controllable 3D worlds in real time at 24 frames per second, demoed live on the show with a spaceship exploration and paint persistence on walls. It ships alongside SIMA 2, a self-improving game-playing agent built on Genie 3, and is available to Gemini Ultra subscribers in the US with a one-minute session limit.
24 fps Genie 3 frame rate
New ModelsOpen weights
Jan v3
Jan AI releases Jan v3, a 4B model built for fast local inference
Jan v3 is a 4B-parameter open model optimized for local inference, hitting 132 tokens/sec with a 262K context window and a 40% improvement on coding. The Jan desktop app it powers has reached 5M downloads.
4B Jan v3 parameters
New ModelsOpen weights
Kimi K2.5
Moonshot AI releases Kimi K2.5, the new open-source king
Moonshot AI's Kimi K2.5 takes the open-source crown, becoming the most-used model on OpenRouter and topping open-source leaderboards. The panel highlighted its strong agentic coding performance and tool use.
New ModelsOpen weights
PersonaPlex-7B
NVIDIA releases PersonaPlex-7B voice model
NVIDIA released PersonaPlex-7B, an open voice/audio model published on Hugging Face with code on GitHub. Listed in the week's Voice & Audio releases.
New Models
HunyuanImage 3.0-Instruct
Tencent launches HunyuanImage 3.0-Instruct image model
Tencent's Hunyuan team launched HunyuanImage 3.0-Instruct, an instruction-tuned version of its image generation model. Covered briefly in the AI Art segment alongside other new image models this week.
New ModelsOpen weights
Qwen3-TTS
Qwen3-TTS: open-source TTS family with 97ms latency and voice cloning
Alibaba's Qwen team released Qwen3-TTS, a full open-source text-to-speech family under Apache 2 that dropped 30 minutes before the show. It spans 5 models from 0.6B to 1.7B parameters, with 97ms latency, voice cloning from just 3 seconds of audio, voice description prompting, and 10-language support.
97ms Latency
New ModelsOpen weights
Chroma 1.0
FlashLabs Chroma 1.0: open-source real-time speech-to-speech under 150ms
FlashLabs released Chroma 1.0, billed as the world's first open-source end-to-end real-time speech-to-speech model with voice cloning under 150ms latency. The 4B parameter model is built on Qwen 2.5 Omni and released under Apache 2; its live demo with RAG and document upload impressed the whole panel.
New Models
TTS-1.5
Inworld TTS-1.5 claims #1 TTS ranking at half a cent per minute
Inworld AI launched TTS-1.5, a closed-source text-to-speech model claiming the #1 ranking with sub-250ms latency. Its headline is price: roughly $5 per million characters (about half a cent per minute) versus ElevenLabs' $120 per million characters.
New ModelsOpen weights
LFM2.5-1.2B-Thinking
Liquid AI's LFM2.5-1.2B-Thinking: on-device reasoning under 900MB
Liquid AI released LFM2.5-1.2B-Thinking, a 1.2B parameter reasoning model that runs entirely on-device with under 900MB of memory. Its hybrid architecture with gated convolutions delivers 239 tokens/sec on an AMD CPU and 82 tokens/sec on a mobile NPU, making it practical for edge devices, Raspberry Pi, and older iPhones.
1.2B Parameters, under 900MB memory
New Models
Waypoint-1
Overworld's Waypoint-1: real-time AI world model at 60fps on consumer GPUs
Overworld released Waypoint-1, a real-time AI world model that runs at 60fps on consumer GPUs. It generates interactive environments live, bringing world-model tech out of research demos and onto hardware people actually own.
New Models
Runway 4.5
Runway 4.5 launches with image-to-video and audio
Runway launched version 4.5 of its video generation model, adding image-to-video and audio support. It was mentioned in the week's news rundown as part of a busy week for vision and video releases.
New ModelsOpen weights
GLM-4.7-Flash
GLM-4.7-Flash: 30B MoE local coding agent with only 3B active params
Z.AI released GLM-4.7-Flash, a 30B parameter MoE model with only 3B active parameters, designed as the ultimate local coding and agent assistant. It hits 59% on SWE-Bench Verified (approaching Sonnet 4's 64%) and runs at 120 tokens/sec on a stock Mac Studio M3 Ultra, fast enough to run RALF autonomous coding loops even on CPU.
59% SWE-Bench Verified120 tps Speed on Mac Studio M3 Ultra
New ModelsOpen weights
Flux 2 Klein
Black Forest Labs drops Flux 2 Klein, fast open-weights image model
Wolfram broke the news mid-show: Black Forest Labs released Flux 2 Klein, a fast 4B/9B image generation model with open weights under Apache 2.0. It is designed for near-real-time editing and style iteration, and Alex used it minutes later in his live Claude Cowork demo.
New ModelsOpen weights
M3
M3: 235B open-source medical LLM claims to beat GPT 5.2 on HealthBench
Byte released M3, a 235B parameter medical LLM fine-tuned from Qwen3 and licensed Apache 2.0. With only 22B active parameters, it is runnable at usable speeds on an M3 Ultra, and it claims to beat GPT 5.2 on HealthBench. Nisten suggested pairing it with smaller imaging models like MedGemma rather than treating them as substitutes.
235B M3 Medical LLM
New ModelsOpen weights
MedGemma 1.5
Google releases MedGemma 1.5 for offline medical imaging
Google released MedGemma 1.5, a small (4B-class) open model for medical use cases, compact enough to run offline for medical imaging. The panel stressed it is a different model class from Byte's giant M3 medical LLM and that the two pair well together rather than replacing each other.
New ModelsOpen weights
LongCat Flash Thinking
Meituan's LongCat Flash Thinking: 560B MoE with 27B active, MIT licensed
Meituan released LongCat Flash Thinking, an open-source reasoning MoE with 560B total parameters and only 27B active, under an MIT license. It continued the run of large sparse Chinese open-weights models offering frontier-style reasoning at low active-parameter cost.
560B/27B LongCat Flash
New ModelsOpen weights
LTX-2
Lightricks open-sources LTX-2 synchronized audio-video model
Lightricks open-sourced LTX-2, billed as the first truly open audio-video generation model with synchronized audio and video output, releasing full training code alongside the weights. A distilled version is available to try on Replicate.
New ModelsOpen weights
LFM 2.5
Liquid AI LFM 2.5: 1B on-device family with end-to-end audio
Liquid AI released LFM 2.5, a family of ~1.2B parameter on-device models spanning text, vision, and audio, announced at CES alongside AMD's Lisa Su. The models hit 239 tokens/sec on AMD CPU and 100 tokens/sec on iPhone 16 Pro Max, and include a revolutionary end-to-end audio model that skips the traditional ASR-LLM-TTS pipeline entirely, running in as little as 8GB of RAM.
New ModelsOpen weights
MiroThinker 1.5
MiroThinker 1.5: 30B search agent beats trillion-param models
MiroMind AI released MiroThinker 1.5, a 30B parameter open source search agent that achieves 56.1% on BrowseComp and 66.8% on BrowseComp Chinese, outperforming trillion-parameter models. It introduces 'interactive scaling' as a third scaling dimension beyond parameters and context, and is a fine-tune of Qwen 3 Thinking with 147K open training samples.
New ModelsOpen weights
NousCoder 14B
NousCoder 14B: 7% LiveCodeBench jump in 4 days of RL training
Nous Research released NousCoder 14B, an open source competitive programming model that achieved a 7% jump on LiveCodeBench accuracy in just four days of RL training on 48 NVIDIA B200 GPUs. Training used 24,000 verifiable problems, and the release ships under a full Apache 2 license with training code and a benchmark harness.
New ModelsOpen weights
Alpha Mayo
NVIDIA Alpha Mayo: open source reasoning self-driving models
NVIDIA announced Alpha Mayo at CES, a family of open source reasoning-based self-driving AI models. The models perform end-to-end autonomous driving with explicit reasoning steps, like identifying jaywalkers and stopping accordingly, demoed in a Mercedes-Benz.
New ModelsOpen weights
Nemotron Speech ASR
Nemotron Speech ASR: 600M streaming model with 24ms latency
NVIDIA released Nemotron Speech ASR, a 600M parameter open source streaming speech recognition model with 24ms median latency and support for 900 concurrent streams on a single H100. Kwindla Hultman Kramer of Daily.co demoed sub-500ms voice-to-voice latency using a three-model pipeline of Nemotron ASR, Nemotron Nano LLM, and Magpie TTS.
24ms Nemotron Speech latency
New Models
Qwen Edit 2512
Qwen Edit 2512 optimized by PrunaAI: high-res images in under 7s
PrunaAI released an optimized version of Qwen Edit 2512 that generates high-resolution realistic images in under 7 seconds. The optimized model is available to run on Replicate.
New ModelsOpen weights
Solar Open 100B
Upstage Solar Open 100B: 102B MoE trained on 19.7T tokens
Upstage released Solar Open 100B, a 102B parameter MoE model with only 12B active parameters per token (129 experts, top-8 activation), trained on 19.7 trillion tokens including 4.5T synthetic via a 'data factory' approach. It outperforms GLM 4.5 Air on many benchmarks, features the SNAP PO reinforcement learning technique with a 50% training speedup, and delivers best-in-class Korean language performance.
102B Solar Open params