Google drops Gemma 4 12B, an encoder-free multimodal local model
Google released Gemma 4 12B, an encoder-free multimodal model under Apache 2.0 that targets 16GB VRAM local setups. Instead of bolting separate vision or audio encoders onto a language model, it uses one unified network, which LDJ and Yam argued makes smaller multimodal models cheaper, cleaner, and easier to run locally.
H Company launches Holo 3.1 local computer-use agent models
H Company released Holo 3.1, a family of local computer-use agent models ranging from 0.8B to 35B parameters with new quantized checkpoints. The lineup targets running screen-driving agents on local hardware rather than in the cloud.
Ideogram 4.0 becomes the top open-weight text-to-image model
Ideogram released Ideogram 4.0, a 9.3B-parameter text-to-image model with open weights under a non-commercial license. It leads open-weight image models on typography and layout, with bounding-box/layout-style prompting that trades casual generation ease for precise structured control.
JetBrains open-sources Mellum 2, a 12B MoE coding model
JetBrains released Mellum 2, a 12B mixture-of-experts coding model with only 2.5B active parameters, trained from scratch by a small team using a three-stage curriculum over 10T tokens. The panel read it as IDE companies converting years of developer-workflow context into model advantage; it is also available on CoreWeave Inference.
Microsoft ships MAI-Code-1-Flash into GitHub Copilot
Part of the seven-model MAI launch at Build 2026, MAI-Code-1-Flash is Microsoft AI's fast coding model and ships directly into GitHub Copilot. The panel saw it as a sign Microsoft intends to serve its own models inside its developer surfaces instead of relying solely on OpenAI.
Microsoft launches MAI-Thinking-1, a 1T MoE trained from scratch
Microsoft AI used Build 2026 to launch seven MAI models, headlined by MAI-Thinking-1, a 1T total, 35B active MoE reasoning model trained from scratch on 33T tokens without distillation. The panel read the launch as Microsoft becoming a frontier model lab in its own right rather than only an OpenAI distribution channel.
1T MAI Thinking 1 total parameters33T MAI training tokens
MiniMax announces M3 coding/agentic model with 1M context
MiniMax announced M3, a natively multimodal coding and agentic model with a one-million-token sparse attention context claim and open weights promised soon. Reported numbers include 59 on SWE-bench Pro, and the panel noted MiniMax already has a following for cheap agentic tool calling even as pure coding quality is debated.
NVIDIA ships Nemotron 3.5 ASR, a 600M streaming speech model
NVIDIA released Nemotron 3.5 ASR, a 600M-parameter open multilingual streaming speech-to-text model aimed at voice agents. It supports 40 languages and reportedly delivers 17x more throughput than Parakeet-style baselines at half the size, pushing the latency/accuracy frontier for open voice-agent infrastructure.
NVIDIA releases Nemotron 3 Ultra, a 550B open-weight MoE for agents
NVIDIA dropped Nemotron 3 Ultra the day of the show, a 550B-parameter sparse MoE with 55B active parameters built for long-running agentic harnesses like OpenCode, Hermes, and OpenClaw. Chris Alexiuk joined to explain the hybrid Mamba/Transformer architecture and the unusually complete open release: weights, training data, recipes, a GenRM reward model, and an NVFP4 quantized checkpoint.
550B Nemotron 3 Ultra parameters55B Active parameters
Reve 2.0 hits #2 on Text-to-Image Arena with layout-first editing
Reve 2.0 jumped to second place on Text-to-Image Arena (around 1200 ELO) with native 4K output, code-like layout control, and precise editing. Alex's live tests found inconsistent portrait identity, but the layout-first editor is the real differentiator for graphic and image iteration workflows.
xAI releases Grok Imagine Video 1.5 Preview with synced audio
xAI released a preview of Grok Imagine Video 1.5, an image-to-video model that generates clips with synchronized audio. It adds xAI to the week's crowded race of media-generation model updates.
Cognition rebrands Windsurf into Devin Desktop multi-agent hub
Cognition rebranded Windsurf into Devin Desktop, a multi-agent command center with Agent Client Protocol (ACP) support. The move consolidates Cognition's IDE acquisition into its Devin agent brand as a desktop control surface for running multiple coding agents.
Nous Research launches Hermes Desktop agent app for Mac/Win/Linux
Nous Research launched Hermes Desktop, packaging the Hermes Agent harness into a native desktop app for Mac, Windows, and Linux. Karan previewed chat, permissions, tool-call visibility, reasoning traces, and admin controls aimed at small teams, startups, and personal agent fleets.
NVIDIA announces RTX Spark Arm + Blackwell platform for local AI PCs
At Computex, NVIDIA unveiled RTX Spark, an Arm CPU plus Blackwell GPU PC platform with 128GB unified memory targeting local AI agents and 120B-class local inference. A wave of thin laptops with RTX 5070-class GPUs and roughly one petaflop of local AI compute raises the question of what agents should run locally versus in the cloud.
WolfBench adds 3D token-depth bars to show model efficiency
Wolfram Ravenwolf shipped a WolfBench feature that visualizes token usage alongside benchmark score as 3D token-depth bars. Two models can look close on a leaderboard while one burns dramatically more tokens, which changes the real cost and latency story; Gemini 3.5 Flash and GPT 5.5 were compared as examples.
Arena launches Agent Arena for real-world agent workflow evals
Arena (LMArena) launched Agent Arena during the episode, moving beyond one-turn chatbot preference battles to evaluate models on real agent workflows with web search, files, terminals, user corrections, and objective recovery signals. Peter Gostev joined live to explain why long-running, harder tasks need a different benchmark.