Model Architecture

Architecture innovations: MoE, attention variants, SSMs, diffusion LMs, and long context. — 59 releases covered on the show.

June 2026

MiniMax
New Models

MiniMax M3

MiniMax announces M3 coding/agentic model with 1M context

MiniMax announced M3, a natively multimodal coding and agentic model with a one-million-token sparse attention context claim and open weights promised soon. Reported numbers include 59 on SWE-bench Pro, and the panel noted MiniMax already has a following for cheap agentic tool calling even as pure coding quality is debated.

May 2026

Anthropic
New Models

Claude Opus 4.8

Anthropic ships Claude Opus 4.8 live mid-show

Anthropic released Claude Opus 4.8 during the episode, hitting 69.2% on SWE-bench Pro (up from 64.3% on 4.7 and ahead of GPT-5.5 at 58.6%), a new-best 57.9% on Humanity's Last Exam with tools, and 83.4% on OSWorld-Verified. It also shows a real long-context jump past the usual 200K cliff (85.9% GraphWalks BFS at 256K), with new thinking modes in the UI. Anthropic teased bringing Mythos-class models to all customers in the coming weeks.

69.2% SWE-bench Pro
Cohere
New ModelsOpen weights

Command A+

Cohere releases Command A+, a 218B Apache 2.0 MoE with 25B active params

Cohere released Command A+, a 218B-parameter mixture-of-experts model with 25B active parameters, shipping open weights under Apache 2.0. It was the week's headline open-source release, available on Hugging Face in both W4A4 quantized and BF16 variants.

218B Command A+ parameters25B active parameters
Krea AI
New Models

Krea 2

Krea 2: Krea's first from-scratch foundation image model

Krea released Krea 2, its first foundation image model trained from scratch, built over six to seven months by nearly half the company. It focuses on aesthetic diversity, style control with up to 4 reference images, and moodboard-driven workflows, generating images in roughly 15 seconds. Co-founder and CEO Victor Perez joined the show to walk through it.

April 2026

DeepSeek
New ModelsOpen weights

DeepSeek V4

DeepSeek V4: 1.6T MoE with CSA+HCA attention and 1M context

DeepSeek released the V4 paper and models (V4-Pro and V4-Flash on Hugging Face), a 1.6T-parameter MoE featuring CSA+HCA attention that fits 1M tokens of context in just 5.7GB of KV cache. It is possibly the first frontier model trained across multiple datacenters, and DeepSeek is offering API tokens at an 80% discount on already much cheaper pricing.

1M context window5.7GB KV cache at 1M context
SenseTime
New ModelsOpen weights

SenseNova U1

SenseTime open-sources SenseNova U1 unified multimodal MoE

SenseTime open-sourced SenseNova U1, a unified multimodal MoE model with 8B total and 3B active parameters that handles understanding and generation with no separate encoder or VAE. The architecture builds on a paper the team presented at ICLR last year.

8B total parameters (3B active MoE)
Alibaba (Qwen)
New ModelsOpen weights

Qwen 3.6-35B-A3B

Qwen 3.6-35B-A3B: Apache 2.0 MoE with 3B active hits 73.4% SWE-Verified

Alibaba Qwen open-sourced Qwen 3.6-35B-A3B under Apache 2.0 the same morning Opus 4.7 dropped: a 35B MoE with only 3B active parameters that scores 73.4% on SWE-bench Verified, rivaling models 10x its size. It is natively multimodal with 262K context extensible to 1M, and the crew called it the strongest mid-size LLM on nearly all benchmarks, putting to rest doubts about Qwen's open-source commitment after Junyang Ling's departure.

73.4% SWE-bench Verified

March 2026

Anthropic
Major Features & Updates

Claude Opus 4.6 (1M context)

Anthropic makes Opus 4.6 1M context the default in Claude Code, same price

Anthropic made 1M token context the default for Opus 4.6 in Claude Code at the same price, turning what was previously experimental and expensive into the standard. MRCR benchmark performance holds at 93% at 256K and 76% at 1M. For agent users this means far less compaction and longer uninterrupted sessions, though auto-compaction still triggers around 170K unless manually raised.

1M Opus 4.6 context default
MiniMax
New Models

MiniMax M2.7

MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro

MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.

56% MiniMax 2.7 SWE-bench Pro
Mistral AI
New ModelsOpen weights

Mistral Small 4

Mistral Small 4: 119B MoE with 6B active unifies vision, coding, reasoning

Mistral returned to open source with Small 4, a 119B-parameter MoE with 128 experts and only 6B active per token, released under Apache 2.0. It unifies the previous Pixtral (vision), Devstral (coding), and Magistral (reasoning) lines into one model and can fit on a single H100 when compressed. Early WolfBench results are sobering at ~17% on OpenClaw agent tasks, roughly on par with similarly sized Nemotron.

119B Mistral Small 4 total params
Papers & ResearchOpen weights

Mamba-3

Mamba-3 lands with three SSM innovations for inference-first linear models

Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.

NVIDIA
New ModelsOpen weights

Nemotron 3 Super 120B

NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet

NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.

120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)
Google DeepMind
New Models

Gemini 3.1 Flash-Lite

Google launches Gemini 3.1 Flash-Lite with 1M context at 360 tok/s

Google launched Gemini 3.1 Flash-Lite, a fast and cheap model with 1M token context aimed at the instant/fast tier, running around 360 tokens per second. The panel flagged a material pricing jump versus the prior Flash-Lite generation but saw it as well suited for judge, guardrail, and orchestration workloads in agent systems.

360 tokens/sec Gemini 3.1 Flash-Lite speed
OpenAI
New Models

GPT-5.4

OpenAI drops GPT-5.4 Thinking and GPT-5.4 Pro live during the show

OpenAI released GPT-5.4 Thinking and GPT-5.4 Pro mid-show, a frontier general model that folds Codex-level coding into a unified reasoning model. It ships with a 1M token context window, a /fast mode, and mid-reasoning steering, posting 83.3% on ARC-AGI 2 (Pro) and roughly 75% on OS World computer use. The panel tested it live in Codex and called it a major general-model jump, while noting input pricing rose about 50% versus 5.2.

83.3% ARC-AGI 2 (GPT-5.4 Pro)75% OS World / computer-use score1M Context window

February 2026

Alibaba (Qwen)
New ModelsOpen weights

Qwen 3.5

Qwen 3.5 lands: 35B/3B-active Medium outperforms the old 235B flagship

Alibaba released the Qwen 3.5 family of open-weight models, headlined by Qwen3.5-35B-A3B, a 35B model with only 3B active parameters that outperforms their previous 235B flagship. Variants include a 122B-A10B and a dense 27B, with the panel highlighting the hybrid state-space (Mamba-layer) architecture and strong practical coding and agent performance at a tiny active-parameter footprint.

35B / 3B active Qwen 3.5 Medium
Liquid AI
New ModelsOpen weights

LFM2-24B-A2B

Liquid AI releases LFM2-24B-A2B, a laptop-friendly 24B MoE

Liquid AI released LFM2-24B-A2B, a 24B mixture-of-experts model with only 2.3B active parameters that runs on consumer laptops. The panel highlighted its speed and surprisingly strong non-coding reasoning, reinforcing the trend of efficient low-active-parameter open models for local use.

Alibaba (Qwen)
New ModelsOpen weights

Qwen3.5-397B-A17B

Alibaba opens Qwen 3.5: 397B-param multimodal MoE with only 17B active

Alibaba released Qwen3.5-397B-A17B, billed as the first open-weight native multimodal MoE model, with 397B total parameters, just 17B active, 512 experts, and 262K native context extendable to 1M. It delivers 8.6-19x faster inference than Qwen3-Max and continues Qwen's strength in multilingual and medical tasks, scoring 52.5% on Terminal Bench, third place among open-source models. Nisten found coding still trails GLM-5.

397B Qwen 3.5 Parameters
Anthropic
New Models

Claude Sonnet 4.6

Anthropic ships Claude Sonnet 4.6 with 79.6% SWE-Bench and 1M context

Anthropic launched Claude Sonnet 4.6, its most capable Sonnet ever, scoring 79.6% on SWE-Bench Verified, nearly matching Opus 4.6 at Sonnet pricing of $3/$15 per million tokens. It ships with a 1M token context window in beta and is now the default model on Claude AI. In blind Claude Code testing, users preferred Sonnet 4.6 over the previous Opus 4.5 59% of the time, and it beats the previous Gemini 3 Pro on most benchmarks.

79.6% SWE-Bench Verified

January 2026

Arcee AI
New ModelsOpen weights

Trinity Large

Arcee AI ships Trinity Large: 400B MOE trained in 33 days for $20M

Arcee AI's Trinity Large is a 400B-parameter MOE with 13B active parameters, trained on 17T tokens across 2000 B300 GPUs in 33 days for $20M. It has 512K native context (twice Kimi K2.5), is free on OpenRouter until February 2026, and the panel called it the largest Western open-source lab model.

400B Arcee Trinity Large512K Trinity native context
Upstage
New ModelsOpen weights

Solar Open 100B

Upstage Solar Open 100B: 102B MoE trained on 19.7T tokens

Upstage released Solar Open 100B, a 102B parameter MoE model with only 12B active parameters per token (129 experts, top-8 activation), trained on 19.7 trillion tokens including 4.5T synthetic via a 'data factory' approach. It outperforms GLM 4.5 Air on many benchmarks, features the SNAP PO reinforcement learning technique with a 50% training speedup, and delivers best-in-class Korean language performance.

102B Solar Open params

December 2025

NVIDIA
New ModelsOpen weights

Nemotron 3 Nano

NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes

NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.

30B (3B active) Nemotron 3 Nano parameters
Arcee AI
New ModelsOpen weights

Arcee Trinity

Arcee Trinity launches US-trained open MoE family

Arcee AI introduced Trinity, a family of US-trained open mixture-of-experts models built from scratch, starting with Trinity-Mini and Trinity-Nano-Preview. CTO Lukas Atkins joined the show to discuss the training approach and previewed Trinity-Large for January 2026. The release positions Arcee as a domestic alternative in an open-weights field dominated by Chinese labs.

November 2025

Alibaba (Tongyi)
New ModelsOpen weights

Z-Image Turbo

Tongyi's Z-Image Turbo brings sub-second open image generation

Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.

6B Parameters
Prime Intellect
New ModelsOpen weights

INTELLECT-3

Prime Intellect releases INTELLECT-3, a 106B open MoE model

Prime Intellect released INTELLECT-3, a 106B-parameter mixture-of-experts model with 12B active parameters that scores 90% on AIME 2024/2025. The lab fully open-sourced the training stack alongside the weights, showing a small lab can train frontier-scale models.

106B Total parameters (12B active)90% AIME 2024/2025
Tencent (Hunyuan)
New ModelsOpen weights

HunyuanVideo 1.5

Tencent releases HunyuanVideo 1.5, a lightweight open video model

Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.

Alibaba (Qwen)
New ModelsOpen weights

Qwen Image Edit Multi-Angle LoRA

Qwen Image Edit gains Multi-Angle LoRA for camera control

A Multi-Angle LoRA for Qwen Image Edit landed, enabling camera-control style edits that re-render a scene from new angles. Available as a Hugging Face space and on fal, it shows the fast-moving open ecosystem building on Qwen's image editing models.

October 2025

New ModelsOpen weights

Ming-flash-omni Preview

Ming-flash-omni Preview: sparse MoE omni-modal open model

Ant Group's InclusionAI team released Ming-flash-omni Preview, a sparse mixture-of-experts omni-modal model on Hugging Face. It handles multiple input and output modalities in a single open-weights model, adding to the wave of Chinese open omni-modal releases.

Moonshot AI (Kimi)
New ModelsOpen weights

Kimi Linear

Kimi Linear: 48B open model with linear attention and 1M context

Moonshot AI released Kimi Linear, a 48B parameter (A3B active) instruct model that uses linear attention to reach a 1M token context window. It is an open-weights bet on efficient long-context architectures from the Kimi team.

48B parameters (3B active)1M token context window

September 2025

xAI
New Models

Grok 4 Fast

xAI ships Grok 4 Fast with 2M context at a fraction of the cost

xAI released Grok 4 Fast, a cost-efficient model with a 2M token context window that unifies reasoning and non-reasoning behavior in one set of weights and prices far below Grok 4. The panel treated it as part of the larger competitive pressure cycle on price and speed among frontier labs.

Moondream
New ModelsOpen weights

Moondream 3 (Preview)

Moondream 3 Preview: 9B MoE VLM with 2B active parameters

Moondream released a preview of Moondream 3, a 9B mixture-of-experts vision-language model with only 2B active parameters. It targets frontier-level visual reasoning at small-model cost, continuing Moondream's run of efficient open vision models.

Tencent Hunyuan
Papers & ResearchOpen weights

Hunyuan SRPO

Hunyuan SRPO: preference optimization that supercharges diffusion models

Tencent Hunyuan published SRPO (Semantic Relative Preference Optimization), a post-training technique that significantly improves the output quality of diffusion image models. The team released weights on Hugging Face along with a project page and striking before/after comparisons.

July 2025

Huawei
New ModelsOpen weights

Pangu Pro MoE

Huawei's Pangu Pro MoE: 72B model trained entirely on Ascend NPUs

Huawei released Pangu Pro, a 72B-parameter MoE trained on its own Ascend NPUs rather than Nvidia or AMD hardware, hitting 1,528 tokens/sec and pretrained on 13T tokens. The panel framed it as the geopolitical open-model story of the week, showing how far Chinese compute stacks have advanced under sanctions.

OpenRouter
New Models

Cypher Alpha

Mystery 1M-context model 'Cypher Alpha' appears free on OpenRouter

A stealth model called Cypher Alpha showed up on OpenRouter with a free 1M-token context window, with the panel speculating it could be Amazon Titan. Alex used it as an example of how model releases increasingly arrive as anonymous market probes rather than tidy launches.

Tencent
New ModelsOpen weights

Hunyuan-A13B-Instruct

Tencent ships Hunyuan-A13B: 80B MoE with only 13B active params

Tencent released Hunyuan-A13B-Instruct, an 80B-parameter MoE that activates only 13B parameters at inference while keeping a 256K context window. Built by the team with WizardLM lineage, it posts strong reasoning benchmarks and feels unusually practical for its class, though the panel flagged its license limits.

13B Hunyuan active params

May 2025

Alibaba
New ModelsOpen weights

Wan 2.1

Alibaba's Wan 2.1: open-source diffusion-transformer text-to-video suite

Alibaba, the team behind the Qwen LLMs, released Wan 2.1, a full stack of open-source diffusion-transformer text-to-video foundation models. Amid the show's discussion of video-model fatigue, this was called out as a release that cuts through the noise, with weights on Hugging Face and code on GitHub.

Alibaba (Qwen)
New ModelsOpen weights

Qwen 3

Alibaba open-weights the full Qwen 3 family under Apache 2.0

Alibaba released the entire Qwen 3 stack: two MoE models (235B total/22B active and 30B/3B active) plus six dense siblings from 32B down to 0.6B, all Apache 2.0 with day-one support in LM Studio, Ollama, vLLM, MLX and llama.cpp. The headline feature is a runtime hybrid 'thinking' toggle (/think and /no_think) that trades latency for reasoning depth. Trained on ~36T tokens with 128K context and 119-language coverage, the 235B MoE rivals DeepSeek-R1, o1, o3-mini and Gemini 2.5 Pro on coding and math.

235 B Flagship MoE total parameters (22B active)30 B Qwen3-30B-A3B hit 57 tok/s on a Mac with speculative decoding36 Trillions of pre-training tokens (2x Qwen 2.5)

April 2025

Sand AI
New ModelsOpen weights

MAGI-1

Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model

Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.

24B Parameters24 Frames per autoregressive chunk
Meta AI
New ModelsOpen weights

Llama 4 (Scout & Maverick)

Meta drops Llama 4 Scout (109B) and Maverick (400B) open-weights MoE models

Meta released the long-awaited Llama 4 family in a chaotic Saturday drop: Scout (17B active / ~109B total, 16 experts) and Maverick (17B active / ~400B total, 128 experts), with a 2T-parameter Behemoth still in training. The models are multimodal, multilingual MoE architectures trained on ~30T tokens with FP8 and interleaved attention (iRoPE), claiming 10M context for Scout and 1M for Maverick. The release was marred by drama: the LMArena version differed from the released model, and the community criticized the lack of small local-friendly sizes.

10M Stated context window for Llama 4 Scout288B Active parameters of unreleased Behemoth (2T total)17B Active parameters for both Scout and Maverick
New Models

Dream 7B

Dream 7B: a diffusion language model challenger unveiled

Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.

March 2025

Google DeepMind
New Models

Gemini 2.5 Pro

Google reclaims #1 with Gemini 2.5 Pro thinking model

Google dropped Gemini 2.5 Pro, a thinking model that took the #1 spot as the best all-around LLM available, with massive jumps on benchmarks like AIME (up nearly 20 points) and GPQA. It inherits native multimodality and a 1M token context window, maintaining high accuracy even at 120k+ tokens on needle-in-a-haystack tests, with surprisingly low latency (~13 seconds on hard reasoning questions vs 45+ for others). Tulsee Doshi, head of product for Gemini models, joined the show to give the inside scoop.

20 point jump on AIME benchmark1M token context window13 seconds latency on hard reasoning questions (vs 45+ for others)

February 2025

Inception Labs
New Models

Mercury

Inception Labs debuts Mercury, a commercial diffusion LLM

Inception Labs announced Mercury, billed as the first commercial-scale diffusion large language model, generating text via diffusion rather than autoregressive decoding. The approach promises dramatically faster token throughput, demoed first with the Mercury Coder playground.

Arc Institute & NVIDIA
New ModelsOpen weights

Evo 2

Arc Institute and NVIDIA release Evo 2, a 40B state-of-the-art genomics model

Arc Institute and NVIDIA introduced Evo 2, a state-of-the-art genomics model with around 40 billion parameters trained on 9.3 trillion nucleotides. It uses the StripedHyena architecture to process genetic sequences up to 1 million nucleotides, enabling prediction of genetic mutation effects and even design of entire genomes. Fully open: two papers, weights, data, and training and inference codebases.

January 2025

Google DeepMind
New Models

Gemini 2.0 Flash Thinking 01-21

Google ships updated Gemini Flash Thinking with 1M context

Google released an updated Gemini Flash Thinking model (01-21) with a 1 million token context window, built-in code execution, and improved evals over the previous Thinking release. It pushes Google's reasoning-model line forward in the same week DeepSeek R1 landed.

1M Context window (tokens)