New ModelsOpen weights
Gemma 4 12B
Google drops Gemma 4 12B, an encoder-free multimodal local model
Google released Gemma 4 12B, an encoder-free multimodal model under Apache 2.0 that targets 16GB VRAM local setups. Instead of bolting separate vision or audio encoders onto a language model, it uses one unified network, which LDJ and Yam argued makes smaller multimodal models cheaper, cleaner, and easier to run locally.
New ModelsOpen weights
Holo 3.1
H Company launches Holo 3.1 local computer-use agent models
H Company released Holo 3.1, a family of local computer-use agent models ranging from 0.8B to 35B parameters with new quantized checkpoints. The lineup targets running screen-driving agents on local hardware rather than in the cloud.
New ModelsOpen weights
Ideogram 4.0
Ideogram 4.0 becomes the top open-weight text-to-image model
Ideogram released Ideogram 4.0, a 9.3B-parameter text-to-image model with open weights under a non-commercial license. It leads open-weight image models on typography and layout, with bounding-box/layout-style prompting that trades casual generation ease for precise structured control.
9.3B Ideogram 4 parameters
New ModelsOpen weights
Mellum 2
JetBrains open-sources Mellum 2, a 12B MoE coding model
JetBrains released Mellum 2, a 12B mixture-of-experts coding model with only 2.5B active parameters, trained from scratch by a small team using a three-stage curriculum over 10T tokens. The panel read it as IDE companies converting years of developer-workflow context into model advantage; it is also available on CoreWeave Inference.
New Models
MAI-Code-1-Flash
Microsoft ships MAI-Code-1-Flash into GitHub Copilot
Part of the seven-model MAI launch at Build 2026, MAI-Code-1-Flash is Microsoft AI's fast coding model and ships directly into GitHub Copilot. The panel saw it as a sign Microsoft intends to serve its own models inside its developer surfaces instead of relying solely on OpenAI.
New Models
MAI-Thinking-1
Microsoft launches MAI-Thinking-1, a 1T MoE trained from scratch
Microsoft AI used Build 2026 to launch seven MAI models, headlined by MAI-Thinking-1, a 1T total, 35B active MoE reasoning model trained from scratch on 33T tokens without distillation. The panel read the launch as Microsoft becoming a frontier model lab in its own right rather than only an OpenAI distribution channel.
1T MAI Thinking 1 total parameters33T MAI training tokens
New Models
MiniMax M3
MiniMax announces M3 coding/agentic model with 1M context
MiniMax announced M3, a natively multimodal coding and agentic model with a one-million-token sparse attention context claim and open weights promised soon. Reported numbers include 59 on SWE-bench Pro, and the panel noted MiniMax already has a following for cheap agentic tool calling even as pure coding quality is debated.
New ModelsOpen weights
Nemotron 3.5 ASR
NVIDIA ships Nemotron 3.5 ASR, a 600M streaming speech model
NVIDIA released Nemotron 3.5 ASR, a 600M-parameter open multilingual streaming speech-to-text model aimed at voice agents. It supports 40 languages and reportedly delivers 17x more throughput than Parakeet-style baselines at half the size, pushing the latency/accuracy frontier for open voice-agent infrastructure.
17x Nemotron ASR throughput
New ModelsOpen weights
Nemotron 3 Ultra
NVIDIA releases Nemotron 3 Ultra, a 550B open-weight MoE for agents
NVIDIA dropped Nemotron 3 Ultra the day of the show, a 550B-parameter sparse MoE with 55B active parameters built for long-running agentic harnesses like OpenCode, Hermes, and OpenClaw. Chris Alexiuk joined to explain the hybrid Mamba/Transformer architecture and the unusually complete open release: weights, training data, recipes, a GenRM reward model, and an NVFP4 quantized checkpoint.
550B Nemotron 3 Ultra parameters55B Active parameters
New Models
Reve 2.0
Reve 2.0 hits #2 on Text-to-Image Arena with layout-first editing
Reve 2.0 jumped to second place on Text-to-Image Arena (around 1200 ELO) with native 4K output, code-like layout control, and precise editing. Alex's live tests found inconsistent portrait identity, but the layout-first editor is the real differentiator for graphic and image iteration workflows.
New Models
Grok Imagine Video 1.5 Preview
xAI releases Grok Imagine Video 1.5 Preview with synced audio
xAI released a preview of Grok Imagine Video 1.5, an image-to-video model that generates clips with synchronized audio. It adds xAI to the week's crowded race of media-generation model updates.