New ModelsOpen weights
Irodori-TTS-500M
Irodori-TTS-500M: open Japanese TTS with emoji emotion control
Irodori-TTS-500M is a 500M-parameter open-weights Japanese text-to-speech model released on Hugging Face, notable for controlling emotional delivery through emojis in the input text. It landed as part of the week's wave of voice and audio releases.
New ModelsOpen weights
Cohere Transcribe
Cohere Transcribe: open-source 2B ASR tops Open ASR Leaderboard at 5.42% WER
Cohere entered the ASR game with Transcribe, a 2-billion-parameter Apache 2.0 speech recognition model that immediately took the number-one spot on Hugging Face's Open ASR Leaderboard with a 5.42% word error rate versus Whisper Large v3's 7.44%. It wins 61% of human evaluations on average and 64% head-to-head against Whisper, making it a credible local-inference Whisper replacement for regulated industries.
2B Cohere Transcribe ASR size5.42% Word error rate on Open ASR Leaderboard
New Models
Gemini 3.1 Flash Live
Google drops Gemini 3.1 Flash Live: Gemini can see, hear, and talk to you
Google released Gemini 3.1 Flash Live, a realtime multimodal model that handles voice and vision interaction in a single model path instead of stitched pipelines. The panel framed it as a major upgrade for end-to-end voice and vision agents, with AI Studio and API availability as the immediate way to experiment.
New Models
Lyria 3 Pro
Google Lyria 3 Pro generates full 3-minute music tracks with structural control
Google DeepMind released Lyria 3 Pro, its most advanced music model, generating full 3-minute tracks with structural control over intros, verses, choruses, and bridges, and even composing music from images. The crew generated a drum-and-bass ThursdAI opener live with spot-on instruction following; output is SynthID watermarked and royalty-free, available to Gemini subscribers and via Producer AI.
New Models
Uni-1
Luma Labs Uni-1 thinks and generates pixels simultaneously, #1 preference Elo
Luma Labs released Uni-1, an LLM-based image model that thinks and generates pixels simultaneously and claims the number-one human preference Elo. Unlike traditional diffusion workflows you converse with it and iterate together toward results, and it can also generate infographics; a surprising pivot from Luma's video focus.
New ModelsOpen weights
MiniMax 2.7
MiniMax 2.7 open-source weights discussed as small-model momentum continues
The panel covered MiniMax 2.7 and its open-weights release in the context of small, efficient models becoming genuinely practical for local and specialized agent workflows. The segment focused on capability momentum and how open-weights expectations keep shaping adoption sentiment.
New ModelsOpen weights
Voxtral TTS
Mistral drops Voxtral TTS, a 3B open-weight text-to-speech model
Mistral released Voxtral TTS, its first text-to-speech model, as breaking news during the live show: 3 billion parameters, open weights, with emotion controls for neutral, happy, and frustrated voices. Mistral claims it beats ElevenLabs Flash v2.5 in human preference tests with a 58% win rate on flagship voices and 68% on zero-shot voice cloning, though Alex's live test found it decent rather than stunning.
3B Mistral Voxtral TTS size
New ModelsOpen weights
Reka Edge
Reka AI ships Edge, a 7B multimodal VLM for sub-second on-device inference
Reka AI launched Reka Edge, a 7B-parameter multimodal vision-language model built for sub-second latency on edge devices. Weights are on Hugging Face and the model is available through OpenRouter, with the panel highlighting it as a notable efficient multimodal release for real-world deployment.
New Models
Composer 2
Cursor Composer 2 beats Opus 4.6 on TerminalBench at a tenth of the price
Cursor launched Composer 2, its first proprietary model that genuinely competes with frontier labs. It scores 61% on TerminalBench (beating Opus 4.6) at $0.50/M input tokens, cheaper than GPT-5.4 Mini and 10x cheaper than Opus, running at 300+ tokens/sec. A fast variant costs 3x more for the same intelligence, kicking off a new 'fast mode' pricing trend where you pay a premium for speed rather than capability.
New ModelsOpen weights
Holotron-12B
H Company's Holotron-12B: hybrid SSM computer-use model at 8.9k tok/s
H Company released Holotron-12B, an open-source hybrid SSM model built for computer-use agents. It claims 8,900 tokens/sec generation speed and jumps the WebVoyager benchmark from 35.1% to 80.5%, continuing the trend of hybrid SSM architectures for long-context agent workloads.
8,900 tok/s H Company Holotron 12B
New Models
MiniMax M2.7
MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro
MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.
56% MiniMax 2.7 SWE-bench Pro
New ModelsOpen weights
Mistral Small 4
Mistral Small 4: 119B MoE with 6B active unifies vision, coding, reasoning
Mistral returned to open source with Small 4, a 119B-parameter MoE with 128 experts and only 6B active per token, released under Apache 2.0. It unifies the previous Pixtral (vision), Devstral (coding), and Magistral (reasoning) lines into one model and can fit on a single H100 when compressed. Early WolfBench results are sobering at ~17% on OpenClaw agent tasks, roughly on par with similarly sized Nemotron.
119B Mistral Small 4 total params
New Models
GPT-5.4 Mini & Nano
OpenAI ships GPT-5.4 Mini and Nano for coding, computer use, and subagents
OpenAI released GPT-5.4 Mini ($0.75/M input) and Nano, smaller variants optimized for coding and computer use at a fraction of flagship cost. Mini hits 72% on OS World verified, matching the human baseline and nearly reaching full 5.4's 75%, while beating Sonnet 4.5 on most benchmarks. They are designed as cheap parallel subagent workers under a GPT-5.4 orchestrator in Codex, and Mini is 2x faster than the previous GPT-5 Mini.
New Models
MiMo
Xiaomi MiMo revealed as the 1T-param stealth model topping OpenRouter
Xiaomi revealed MiMo, a 1-trillion-parameter family with omni-modal and language-only variants, unmasked as the stealth model that had been sitting at #1 on OpenRouter. The reveal surprised the panel, marking Xiaomi's entry into the frontier-model conversation.
New ModelsOpen weights
Fish Audio S2
Fish Audio S2 open TTS hits sub-150ms latency
Fish Audio S2 is a fully open-source TTS model with inline emotion control via free-text bracket tags like gasp, laughter, and long pause. Alex demoed it live with an OpenClaw skill that let his 5-year-old talk to a voice clone of 'Rocky' from Project Hail Mary; Wolfram called it 'ElevenLabs V3 for free.'
<150ms Fish Audio S2 TTS latency
New Models
Gemini Embedding 2
Google launches Gemini Embedding 2, a natively multimodal embedder
Google launched Gemini Embedding 2, a natively multimodal embedding model that supports text, image, video, and audio in a single unified embedding space. It is available through the Gemini Embeddings API.
New ModelsOpen weights
LTX Video 2.3
Lightricks ships open-source LTX Video 2.3, runs on an RTX 3090
Lightricks released LTX Video 2.3, an open-source video generation model with improved motion, audio, and quality that runs on a single RTX 3090. It is available on GitHub and Hugging Face.
New ModelsOpen weights
MiroThinker-1.7
MiroThinker-1.7 open-source research agent hits SOTA
MiroMind released MiroThinker-1.7, an open-source deep-research agent model that reaches state of the art on deep research benchmarks. It was covered alongside NVIDIA's Nemotron launch in the open-source segment.
New Models
embed-large-v3
Mixbread embed-large-v3 beats Gemini Embedding 2
mixbread.ai dropped embed-large-v3, an embedding model that beats Gemini Embedding 2 on nearly every benchmark, including a jaw-dropping 98% vs 6.9% on structured-data tasks. Benjamin Clavie announced it live during the show.
98% Mixbread embed-large-v3 structured data benchmark score (vs 6.9% for Gemini)
New ModelsOpen weights
Nemotron 3 Super 120B
NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet
NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.
120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)
New ModelsOpen weights
Covenant-72B
Covenant-72B: a decentralized-trained open 72B LLM
Covenant-72B is a decentralized 72B-parameter open LLM, released and shared via Hugging Face. It was highlighted in the open-source segment as an example of decentralized model training.
New ModelsOpen weights
Qwen3.5 Small Series
Alibaba releases Qwen3.5 small models (2B, 4B, 9B) for local use
Alibaba released the Qwen3.5 small model series with 2B, 4B, and 9B variants, which the panel found highly usable on consumer hardware. The release landed alongside leadership turbulence as Junyang Lin and Binyuan Hui departed Qwen, though the panel expects Alibaba's open-source momentum to continue.
New Models
SWE-1.6
Cognition previews SWE-1.6, hitting 51% on SWE Bench Pro
Cognition previewed SWE-1.6, the next iteration of its software-engineering model line, citing 51% on SWE Bench Pro. It was covered in the TL;DR tools segment as part of the week's agentic coding model releases.
51% SWE Bench Pro (SWE 1.6)
New Models
Gemini 3.1 Flash-Lite
Google launches Gemini 3.1 Flash-Lite with 1M context at 360 tok/s
Google launched Gemini 3.1 Flash-Lite, a fast and cheap model with 1M token context aimed at the instant/fast tier, running around 360 tokens per second. The panel flagged a material pricing jump versus the prior Flash-Lite generation but saw it as well suited for judge, guardrail, and orchestration workloads in agent systems.
360 tokens/sec Gemini 3.1 Flash-Lite speed
New ModelsOpen weights
Yuan 3.0 Ultra
Yuan AI Lab releases Yuan 3.0 Ultra open-weights model
Yuan AI Lab (IEIT) released Yuan 3.0 Ultra, a new open-weights model published on Hugging Face under the IEITYuan org. It was covered in the open-source LLM roundup as part of a busy week for Chinese open model releases.
New Models
GPT-5.3 Instant
OpenAI rolls out GPT-5.3 Instant as the free-tier fast model
OpenAI rolled out GPT-5.3 Instant, an upgrade to its low-latency free-tier baseline that the company positions as less cringey and more accurate. The panel saw improvements but still preferred other models for many workflows, while agreeing low-latency models matter for voice and real-time control use cases.
New Models
GPT-5.4
OpenAI drops GPT-5.4 Thinking and GPT-5.4 Pro live during the show
OpenAI released GPT-5.4 Thinking and GPT-5.4 Pro mid-show, a frontier general model that folds Codex-level coding into a unified reasoning model. It ships with a 1M token context window, a /fast mode, and mid-reasoning steering, posting 83.3% on ARC-AGI 2 (Pro) and roughly 75% on OS World computer use. The panel tested it live in Codex and called it a major general-model jump, while noting input pricing rose about 50% versus 5.2.
83.3% ARC-AGI 2 (GPT-5.4 Pro)75% OS World / computer-use score1M Context window
New ModelsOpen weights
Step 3.5 Flash Base
StepFun open-sources Step 3.5 Flash Base with its training stack
StepFun released Step 3.5 Flash Base and Midtrain checkpoints, an unusually open release that includes training artifacts and the SteptronOSS training stack alongside the weights. The panel praised the Apache-2 orientation and called the continuation-pretraining flexibility a major practical unlock for builders.