Model Architecture

#open-source #coding #agents

Z.ai (Zhipu AI) Jun 18, 2026

New ModelsOpen weights

GLM-5.2

Z.ai releases GLM-5.2, a 753B open MoE with 1M context

Z.ai released GLM-5.2 as a major open-source coding and agentic model: a 753B-parameter MoE, MIT-licensed, with a one-million-token context window. The episode treated it as the open-source model that arrived exactly as Fable access disappeared, with strong coding and agentic performance close to the frontier.

753B parameters1M context windowMIT license

Z.ai announcement on X ↗GLM-5.2 blog ↗GLM-5.2 on Hugging Face ↗GLM-5.2 docs ↗

#open-source #coding #agents

MiniMax Jun 4, 2026

New Models

MiniMax M3

MiniMax announces M3 coding/agentic model with 1M context

MiniMax announced M3, a natively multimodal coding and agentic model with a one-million-token sparse attention context claim and open weights promised soon. Reported numbers include 59 on SWE-bench Pro, and the panel noted MiniMax already has a following for cheap agentic tool calling even as pure coding quality is debated.

X announcement ↗API ↗MiniMax Code ↗

#coding #agents #architecture

May 2026

Anthropic May 28, 2026

New Models

Claude Opus 4.8

Anthropic ships Claude Opus 4.8 live mid-show

Anthropic released Claude Opus 4.8 during the episode, hitting 69.2% on SWE-bench Pro (up from 64.3% on 4.7 and ahead of GPT-5.5 at 58.6%), a new-best 57.9% on Humanity's Last Exam with tools, and 83.4% on OSWorld-Verified. It also shows a real long-context jump past the usual 200K cliff (85.9% GraphWalks BFS at 256K), with new thinking modes in the UI. Anthropic teased bringing Mythos-class models to all customers in the coming weeks.

69.2% SWE-bench Pro

Claude Opus 4.8 — blog ↗Claude Opus 4.8 — system card ↗

#frontier-models #coding #reasoning

Cohere May 21, 2026

New ModelsOpen weights

Command A+

Cohere releases Command A+, a 218B Apache 2.0 MoE with 25B active params

Cohere released Command A+, a 218B-parameter mixture-of-experts model with 25B active parameters, shipping open weights under Apache 2.0. It was the week's headline open-source release, available on Hugging Face in both W4A4 quantized and BF16 variants.

218B Command A+ parameters25B active parameters

Cohere blog ↗Nick Frosst ↗HF W4A4 ↗HF BF16 ↗

Blog ↗Nous Research on X ↗arXiv ↗GitHub ↗

Nous Research May 21, 2026

Papers & ResearchOpen weights

Lighthouse Attention

Nous Research publishes Lighthouse Attention for fast long-context pretraining

Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining that delivers major speedups. The release includes a blog post, an arXiv paper and an open-source GitHub implementation.

#research #architecture #open-source

Krea AI May 14, 2026

New Models

Krea 2

Krea 2: Krea's first from-scratch foundation image model

Krea released Krea 2, its first foundation image model trained from scratch, built over six to seven months by nearly half the company. It focuses on aesthetic diversity, style control with up to 4 reference images, and moodboard-driven workflows, generating images in roughly 15 seconds. Co-founder and CEO Victor Perez joined the show to walk through it.

X announcement ↗Blog ↗

DeepSeek announcement on X ↗Arxiv paper ↗DeepSeek-V4-Pro on Hugging Face ↗DeepSeek-V4-Flash on Hugging Face ↗

April 2026

DeepSeek Apr 30, 2026

New ModelsOpen weights

DeepSeek V4

DeepSeek V4: 1.6T MoE with CSA+HCA attention and 1M context

DeepSeek released the V4 paper and models (V4-Pro and V4-Flash on Hugging Face), a 1.6T-parameter MoE featuring CSA+HCA attention that fits 1M tokens of context in just 5.7GB of KV cache. It is possibly the first frontier model trained across multiple datacenters, and DeepSeek is offering API tokens at an 80% discount on already much cheaper pricing.

1M context window5.7GB KV cache at 1M context

#open-source #architecture #training

NVIDIA Apr 30, 2026

New ModelsOpen weights

Nemotron 3 Nano Omni

NVIDIA Nemotron 3 Nano Omni: hybrid Transformer-Mamba MoE

NVIDIA released Nemotron 3 Nano Omni, a 30B-total/3B-active hybrid Transformer-Mamba MoE with 256K context. It delivers 9x throughput on consumer hardware.

NVIDIA blog ↗

#open-source #multimodal #architecture

SenseTime Apr 30, 2026

New ModelsOpen weights

SenseNova U1

SenseTime open-sources SenseNova U1 unified multimodal MoE

SenseTime open-sourced SenseNova U1, a unified multimodal MoE model with 8B total and 3B active parameters that handles understanding and generation with no separate encoder or VAE. The architecture builds on a paper the team presented at ICLR last year.

8B total parameters (3B active MoE)

SenseTime announcement on X ↗Hugging Face collection ↗GitHub ↗Try it ↗

#open-source #multimodal #architecture

0 0xSero Apr 16, 2026

New ModelsOpen weights

Gemma 4 21B REAP

Gemma 4 21B REAP: 20% expert-pruned Gemma 4 26B MoE

Community researcher 0xSero released Gemma 4 21B-A4B REAP, a 20% expert-pruned version of the Gemma 4 26B MoE created using Cerebras' REAP pruning technique. It shrinks the model for cheaper local inference while preserving most of its quality.

gemma-4-21b-a4b-it-REAP on Hugging Face ↗

#open-source #architecture #on-device

Alibaba (Qwen) Apr 16, 2026

New ModelsOpen weights

Qwen 3.6-35B-A3B

Qwen 3.6-35B-A3B: Apache 2.0 MoE with 3B active hits 73.4% SWE-Verified

Alibaba Qwen open-sourced Qwen 3.6-35B-A3B under Apache 2.0 the same morning Opus 4.7 dropped: a 35B MoE with only 3B active parameters that scores 73.4% on SWE-bench Verified, rivaling models 10x its size. It is natively multimodal with 262K context extensible to 1M, and the crew called it the strongest mid-size LLM on nearly all benchmarks, putting to rest doubts about Qwen's open-source commitment after Junyang Ling's departure.

73.4% SWE-bench Verified

Qwen 3.6 announcement (X) ↗Qwen3.6-35B-A3B on Hugging Face ↗Qwen blog: Qwen 3.6-35B-A3B ↗

#open-source #architecture #coding

Baidu Apr 16, 2026

New ModelsOpen weights

ERNIE-Image

Baidu ERNIE-Image: 8B DiT ranks #1 on GenEval among open models

Baidu released ERNIE-Image, an 8B diffusion transformer that ranks #1 on GenEval among open models and features precise multilingual text rendering. It is part of this week's wave of Chinese open releases in image and 3D generation.

ERNIE-Image on Hugging Face ↗

#image-gen #architecture #open-source

Alibaba (Qwen) Apr 2, 2026

New Models

Qwen3.6-Plus

Alibaba ships Qwen3.6-Plus with near-Opus agentic coding and 1M context

Alibaba released Qwen3.6-Plus, an API model with agentic coding performance near Opus 4.5 and a 1M-token context window. The panel noted continued strong momentum for the Qwen family in practical coding and agent workloads.

Announcement (X) ↗Qwen blog ↗

#coding #agents #architecture

March 2026

Anthropic Mar 19, 2026

Major Features & Updates

Claude Opus 4.6 (1M context)

Anthropic makes Opus 4.6 1M context the default in Claude Code, same price

Anthropic made 1M token context the default for Opus 4.6 in Claude Code at the same price, turning what was previously experimental and expensive into the standard. MRCR benchmark performance holds at 93% at 256K and 76% at 1M. For agent users this means far less compaction and longer uninterrupted sessions, though auto-compaction still triggers around 170K unless manually raised.

1M Opus 4.6 context default

#architecture #agents #coding

MiniMax Mar 19, 2026

New Models

MiniMax M2.7

MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro

MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.

56% MiniMax 2.7 SWE-bench Pro

MiniMax announcement ↗MiniMax on X ↗TestingCatalog on X ↗MiniMax M2.7 announcement (X) ↗

🎙️ Hear our coverage (+1 follow-up) →

#coding #agents #reasoning

Mistral AI Mar 19, 2026

New ModelsOpen weights

Mistral Small 4

Mistral Small 4: 119B MoE with 6B active unifies vision, coding, reasoning

Mistral returned to open source with Small 4, a 119B-parameter MoE with 128 experts and only 6B active per token, released under Apache 2.0. It unifies the previous Pixtral (vision), Devstral (coding), and Magistral (reasoning) lines into one model and can fit on a single H100 when compressed. Early WolfBench results are sobering at ~17% on OpenClaw agent tasks, roughly on par with similarly sized Nemotron.

119B Mistral Small 4 total params

Mistral blog ↗Hugging Face ↗X announcement ↗

#open-source #architecture #multimodal

S State Spaces (Albert Gu et al.) Mar 19, 2026

Papers & ResearchOpen weights

Mamba-3

Mamba-3 lands with three SSM innovations for inference-first linear models

Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.

Arxiv paper ↗GitHub ↗Albert Gu on X ↗

#research #architecture #open-source

NVIDIA Mar 13, 2026

New ModelsOpen weights

Nemotron 3 Super 120B

NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet

NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.

120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)

NVIDIA on X ↗Nemotron 3 Super blog post ↗Nemotron 3 Super on HuggingFace ↗W&B Inference (Nemotron) ↗

#open-source #architecture #reasoning

Black Forest Labs Mar 5, 2026

Papers & Research

Self-Flow

Black Forest Labs introduces Self-Flow

Black Forest Labs published Self-Flow, new research from the FLUX makers in the AI art and diffusion space. It was included in the week's AI Art & Diffusion roundup.

BFL Self-Flow announcement ↗Self-Flow research page ↗

#image-gen #architecture #research

Google DeepMind Mar 5, 2026

New Models

Gemini 3.1 Flash-Lite

Google launches Gemini 3.1 Flash-Lite with 1M context at 360 tok/s

Google launched Gemini 3.1 Flash-Lite, a fast and cheap model with 1M token context aimed at the instant/fast tier, running around 360 tokens per second. The panel flagged a material pricing jump versus the prior Flash-Lite generation but saw it as well suited for judge, guardrail, and orchestration workloads in agent systems.

360 tokens/sec Gemini 3.1 Flash-Lite speed

Logan Kilpatrick announcement ↗Gemini Flash-Lite page ↗

#frontier-models #architecture #infrastructure

OpenAI Mar 5, 2026

New Models

GPT-5.4

OpenAI drops GPT-5.4 Thinking and GPT-5.4 Pro live during the show

OpenAI released GPT-5.4 Thinking and GPT-5.4 Pro mid-show, a frontier general model that folds Codex-level coding into a unified reasoning model. It ships with a 1M token context window, a /fast mode, and mid-reasoning steering, posting 83.3% on ARC-AGI 2 (Pro) and roughly 75% on OS World computer use. The panel tested it live in Codex and called it a major general-model jump, while noting input pricing rose about 50% versus 5.2.

83.3% ARC-AGI 2 (GPT-5.4 Pro)75% OS World / computer-use score1M Context window

OpenAI GPT-5.4 announcement ↗ARC Prize on GPT-5.4 ↗Alex Volkov's live reaction thread ↗Benchmark breakdown by @nasqret ↗

#frontier-models #reasoning #coding

February 2026

Alibaba (Qwen) Feb 26, 2026

New ModelsOpen weights

Qwen 3.5

Qwen 3.5 lands: 35B/3B-active Medium outperforms the old 235B flagship

Alibaba released the Qwen 3.5 family of open-weight models, headlined by Qwen3.5-35B-A3B, a 35B model with only 3B active parameters that outperforms their previous 235B flagship. Variants include a 122B-A10B and a dense 27B, with the panel highlighting the hybrid state-space (Mamba-layer) architecture and strong practical coding and agent performance at a tiny active-parameter footprint.

35B / 3B active Qwen 3.5 Medium

Qwen announcement on X ↗Qwen3.5-35B-A3B on Hugging Face ↗Qwen3.5-122B-A10B on Hugging Face ↗Qwen 3.5 blog post ↗

#open-source #architecture #coding

Liquid AI Feb 26, 2026

New ModelsOpen weights

LFM2-24B-A2B

Liquid AI releases LFM2-24B-A2B, a laptop-friendly 24B MoE

Liquid AI released LFM2-24B-A2B, a 24B mixture-of-experts model with only 2.3B active parameters that runs on consumer laptops. The panel highlighted its speed and surprisingly strong non-coding reasoning, reinforcing the trend of efficient low-active-parameter open models for local use.

Liquid AI announcement on X ↗LFM2-24B-A2B on Hugging Face ↗Liquid AI blog post ↗

#open-source #architecture #on-device

Alibaba (Qwen) Feb 19, 2026

New ModelsOpen weights

Qwen3.5-397B-A17B

Alibaba opens Qwen 3.5: 397B-param multimodal MoE with only 17B active

Alibaba released Qwen3.5-397B-A17B, billed as the first open-weight native multimodal MoE model, with 397B total parameters, just 17B active, 512 experts, and 262K native context extendable to 1M. It delivers 8.6-19x faster inference than Qwen3-Max and continues Qwen's strength in multilingual and medical tasks, scoring 52.5% on Terminal Bench, third place among open-source models. Nisten found coding still trails GLM-5.

397B Qwen 3.5 Parameters

Qwen 3.5 announcement (X) ↗Qwen3.5-397B-A17B on Hugging Face ↗

#open-source #architecture #multilingual

Anthropic Feb 19, 2026

New Models

Claude Sonnet 4.6

Anthropic ships Claude Sonnet 4.6 with 79.6% SWE-Bench and 1M context

Anthropic launched Claude Sonnet 4.6, its most capable Sonnet ever, scoring 79.6% on SWE-Bench Verified, nearly matching Opus 4.6 at Sonnet pricing of $3/$15 per million tokens. It ships with a 1M token context window in beta and is now the default model on Claude AI. In blind Claude Code testing, users preferred Sonnet 4.6 over the previous Opus 4.5 59% of the time, and it beats the previous Gemini 3 Pro on most benchmarks.

79.6% SWE-Bench Verified

Claude Sonnet 4.6 announcement (X) ↗Anthropic blog: Claude Sonnet 4.6 ↗Claude Sonnet page ↗

#coding #agents #architecture

January 2026

Arcee AI Jan 29, 2026

New ModelsOpen weights

Trinity Large

Arcee AI ships Trinity Large: 400B MOE trained in 33 days for $20M

Arcee AI's Trinity Large is a 400B-parameter MOE with 13B active parameters, trained on 17T tokens across 2000 B300 GPUs in 33 days for $20M. It has 512K native context (twice Kimi K2.5), is free on OpenRouter until February 2026, and the panel called it the largest Western open-source lab model.

400B Arcee Trinity Large512K Trinity native context

Announcement (X) ↗Blog ↗Hugging Face (Preview) ↗Hugging Face (Base) ↗

Sakana AI RePo announcement (X) ↗RePo paper (arXiv) ↗RePo project page ↗

Sakana AI Jan 22, 2026

Papers & Research

RePo

Sakana AI's RePo lets LLMs dynamically reorganize their context

Sakana AI introduced RePo, a research technique that lets language models dynamically reorganize their context for better attention. The paper proposes a new way to manage what a model focuses on, aimed at improving performance on long-context tasks.

Solar Open 100B on X ↗Solar Open 100B on Hugging Face ↗Solar Open Tech Report ↗

#research #architecture

Upstage Jan 8, 2026

New ModelsOpen weights

Solar Open 100B

Upstage Solar Open 100B: 102B MoE trained on 19.7T tokens

Upstage released Solar Open 100B, a 102B parameter MoE model with only 12B active parameters per token (129 experts, top-8 activation), trained on 19.7 trillion tokens including 4.5T synthetic via a 'data factory' approach. It outperforms GLM 4.5 Air on many benchmarks, features the SNAP PO reinforcement learning technique with a 50% training speedup, and delivers best-in-class Korean language performance.

102B Solar Open params

#open-source #architecture #multilingual

December 2025

MiniMax (Hailuo) Dec 25, 2025

New ModelsOpen weights

MiniMax-01

MiniMax-01: open model with a 4M token context window

MiniMax (Hailuo) released MiniMax-01 in January with a 4 million token context window, by far the largest context of any open-weights model at the time. It was an early sign of the Chinese-lab open source dominance that defined 2025.

Jan 17 Episode ↗

Allen AI Dec 18, 2025

New ModelsOpen weights

BOLMO

Allen AI's BOLMO reaches byte-level parity with tokenized models

Allen AI released BOLMO, described as the first byte-level language model to reach parity with regular tokenization-based models. The panel framed it as a research breakthrough that could eventually remove tokenizers from the LLM stack.

BOLMO announcement ↗

#open-source #research #architecture

NVIDIA Dec 18, 2025

New ModelsOpen weights

Nemotron 3 Nano

NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes

NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.

30B (3B active) Nemotron 3 Nano parameters

NVIDIA Nemotron 3 Nano announcement ↗NVIDIA Nemotron 3 Nano (HF BF16) ↗NVIDIA Nemotron 3 Nano (HF FP8) ↗

#open-source #architecture #infrastructure

Arcee AI Dec 4, 2025

New ModelsOpen weights

Arcee Trinity

Arcee Trinity launches US-trained open MoE family

Arcee AI introduced Trinity, a family of US-trained open mixture-of-experts models built from scratch, starting with Trinity-Mini and Trinity-Nano-Preview. CTO Lukas Atkins joined the show to discuss the training approach and previewed Trinity-Large for January 2026. The release positions Arcee as a domestic alternative in an open-weights field dominated by Chinese labs.

Arcee Trinity Manifesto ↗Trinity-Mini (Hugging Face) ↗Trinity-Nano-Preview (Hugging Face) ↗Lukas Atkins announcement on X ↗

Z-Image Turbo on HuggingFace ↗Z-Image on GitHub ↗

November 2025

Alibaba (Tongyi) Nov 27, 2025

New ModelsOpen weights

Z-Image Turbo

Tongyi's Z-Image Turbo brings sub-second open image generation

Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.

6B Parameters

#image-gen #open-source #architecture

Prime Intellect Nov 27, 2025

New ModelsOpen weights

INTELLECT-3

Prime Intellect releases INTELLECT-3, a 106B open MoE model

Prime Intellect released INTELLECT-3, a 106B-parameter mixture-of-experts model with 12B active parameters that scores 90% on AIME 2024/2025. The lab fully open-sourced the training stack alongside the weights, showing a small lab can train frontier-scale models.

106B Total parameters (12B active)90% AIME 2024/2025

INTELLECT-3 on HuggingFace ↗INTELLECT-3 Blog ↗INTELLECT-3 Announcement on X ↗Try INTELLECT-3 ↗

#open-source #reasoning #architecture

Tencent (Hunyuan) Nov 27, 2025

New ModelsOpen weights

HunyuanVideo 1.5

Tencent releases HunyuanVideo 1.5, a lightweight open video model

Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.

HunyuanVideo on HuggingFace ↗HunyuanVideo on GitHub ↗HunyuanVideo 1.5 Announcement on X ↗

#video-gen #open-source #architecture

Alibaba (Qwen) Nov 13, 2025

New ModelsOpen weights

Qwen Image Edit Multi-Angle LoRA

Qwen Image Edit gains Multi-Angle LoRA for camera control

A Multi-Angle LoRA for Qwen Image Edit landed, enabling camera-control style edits that re-render a scene from new angles. Available as a Hugging Face space and on fal, it shows the fast-moving open ecosystem building on Qwen's image editing models.

Linoy Tsaban demo on X ↗Qwen-Image-Edit-Angles space on Hugging Face ↗fal on X ↗

Announcement on X ↗Hugging Face model page ↗Diffusers ChronoEdit docs ↗

NVIDIA Nov 13, 2025

New ModelsOpen weights

ChronoEdit-14B Upscaler LoRA

NVIDIA releases ChronoEdit-14B Upscaler LoRA

NVIDIA released an Upscaler LoRA for its ChronoEdit-14B image editing model, available on Hugging Face with Diffusers pipeline support. It adds high-quality upscaling to the ChronoEdit physics-aware editing stack.

Grok 4 Fast 2M context on X ↗Grok update thread on X ↗

xAI Nov 13, 2025

Major Features & Updates

Grok 4 Fast

Grok 4 Fast expands to a 2 million token context window

xAI's Grok 4 Fast now supports a 2 million token context window, one of the largest of any frontier model. The crew called the jump 'crazy' and discussed what such long context unlocks for agentic and document-heavy workloads.

2M Context Window

#architecture #frontier-models

October 2025

InclusionAI (Ant Group) Oct 30, 2025

New ModelsOpen weights

Ming-flash-omni Preview

Ming-flash-omni Preview: sparse MoE omni-modal open model

Ant Group's InclusionAI team released Ming-flash-omni Preview, a sparse mixture-of-experts omni-modal model on Hugging Face. It handles multiple input and output modalities in a single open-weights model, adding to the wave of Chinese open omni-modal releases.

X announcement ↗Hugging Face ↗

#open-source #multimodal #architecture

Moonshot AI (Kimi) Oct 30, 2025

New ModelsOpen weights

Kimi Linear

Kimi Linear: 48B open model with linear attention and 1M context

Moonshot AI released Kimi Linear, a 48B parameter (A3B active) instruct model that uses linear attention to reach a 1M token context window. It is an open-weights bet on efficient long-context architectures from the Kimi team.

48B parameters (3B active)1M token context window

Hugging Face ↗

September 2025

xAI Sep 25, 2025

New Models

Grok 4 Fast

xAI ships Grok 4 Fast with 2M context at a fraction of the cost

xAI released Grok 4 Fast, a cost-efficient model with a 2M token context window that unifies reasoning and non-reasoning behavior in one set of weights and prices far below Grok 4. The panel treated it as part of the larger competitive pressure cycle on price and speed among frontier labs.

X ↗Blog ↗

#reasoning #architecture #frontier-models

Moondream Sep 18, 2025

New ModelsOpen weights

Moondream 3 (Preview)

Moondream 3 Preview: 9B MoE VLM with 2B active parameters

Moondream released a preview of Moondream 3, a 9B mixture-of-experts vision-language model with only 2B active parameters. It targets frontier-level visual reasoning at small-model cost, continuing Moondream's run of efficient open vision models.

X ↗HF ↗

#vision #open-source #architecture

Tencent Hunyuan Sep 18, 2025

Papers & ResearchOpen weights

Hunyuan SRPO

Hunyuan SRPO: preference optimization that supercharges diffusion models

Tencent Hunyuan published SRPO (Semantic Relative Preference Optimization), a post-training technique that significantly improves the output quality of diffusion image models. The team released weights on Hugging Face along with a project page and striking before/after comparisons.

X ↗HF ↗Project ↗Comparison X ↗

X coverage ↗Hugging Face ↗

July 2025

Huawei Jul 3, 2025

New ModelsOpen weights

Pangu Pro MoE

Huawei's Pangu Pro MoE: 72B model trained entirely on Ascend NPUs

Huawei released Pangu Pro, a 72B-parameter MoE trained on its own Ascend NPUs rather than Nvidia or AMD hardware, hitting 1,528 tokens/sec and pretrained on 13T tokens. The panel framed it as the geopolitical open-model story of the week, showing how far Chinese compute stacks have advanced under sanctions.

#open-source #architecture #infrastructure

OpenRouter Jul 3, 2025

New Models

Cypher Alpha

Mystery 1M-context model 'Cypher Alpha' appears free on OpenRouter

A stealth model called Cypher Alpha showed up on OpenRouter with a free 1M-token context window, with the panel speculating it could be Amazon Titan. Alex used it as an example of how model releases increasingly arrive as anonymous market probes rather than tidy launches.

OpenRouter listing ↗

#architecture #frontier-models

Tencent Jul 3, 2025

New ModelsOpen weights

Hunyuan-A13B-Instruct

Tencent ships Hunyuan-A13B: 80B MoE with only 13B active params

Tencent released Hunyuan-A13B-Instruct, an 80B-parameter MoE that activates only 13B parameters at inference while keeping a 256K context window. Built by the team with WizardLM lineage, it posts strong reasoning benchmarks and feels unusually practical for its class, though the panel flagged its license limits.

13B Hunyuan active params

X announcement ↗Hugging Face ↗Try it ↗

#open-source #architecture #reasoning

May 2025

Alibaba May 15, 2025

New ModelsOpen weights

Wan 2.1

Alibaba's Wan 2.1: open-source diffusion-transformer text-to-video suite

Alibaba, the team behind the Qwen LLMs, released Wan 2.1, a full stack of open-source diffusion-transformer text-to-video foundation models. Amid the show's discussion of video-model fatigue, this was called out as a release that cuts through the noise, with weights on Hugging Face and code on GitHub.

Hugging Face ↗GitHub ↗Announcement tweet ↗Try it ↗

#video-gen #open-source #architecture

Alibaba (Qwen) May 1, 2025

New ModelsOpen weights

Qwen 3

Alibaba open-weights the full Qwen 3 family under Apache 2.0

Alibaba released the entire Qwen 3 stack: two MoE models (235B total/22B active and 30B/3B active) plus six dense siblings from 32B down to 0.6B, all Apache 2.0 with day-one support in LM Studio, Ollama, vLLM, MLX and llama.cpp. The headline feature is a runtime hybrid 'thinking' toggle (/think and /no_think) that trades latency for reasoning depth. Trained on ~36T tokens with 128K context and 119-language coverage, the 235B MoE rivals DeepSeek-R1, o1, o3-mini and Gemini 2.5 Pro on coding and math.

235 B Flagship MoE total parameters (22B active)30 B Qwen3-30B-A3B hit 57 tok/s on a Mac with speculative decoding36 Trillions of pre-training tokens (2x Qwen 2.5)

Qwen 3 blog post ↗GitHub ↗Hugging Face collection ↗HF demo ↗

#open-source #reasoning #architecture

April 2025

Sand AI Apr 24, 2025

New ModelsOpen weights

MAGI-1

Sand AI surprises with MAGI-1, a 24B streaming autoregressive video model

Sand AI released MAGI-1, a 24B autoregressive diffusion model for long-form, streaming video generation with remarkable character consistency, often the Achilles' heel of AI video. It predicts video in 24-frame chunks with causal attention between them, enabling real-time streaming generation where compute doesn't scale with length. Nisten speculated it could be a major step toward usable AI-generated movies by solving the face/character consistency problem.

24B Parameters24 Frames per autoregressive chunk

X Post ↗GitHub ↗PDF Report ↗HF Repo ↗

#video-gen #open-source #architecture

ByteDance Apr 17, 2025

New Models

Seedream 3.0

ByteDance Seedream 3.0: bilingual 2K text-to-image model

ByteDance's Seed team announced Seedream 3.0, a powerful bilingual (Chinese/English) text-to-image model that generates native 2048x2048 images with fast inference of around 3 seconds for a 1K image on an A100. It challenges the top closed image generation models.

Tech post ↗arXiv ↗AIbase news ↗

Our Coverage ↗Prompting guide ↗

OpenAI Apr 17, 2025

New Models

GPT-4.1, 4.1-mini, 4.1-nano

OpenAI launches GPT-4.1 family (4.1, mini, nano) in the API

OpenAI released the GPT-4.1 family of models, available via API only, in three sizes: 4.1, 4.1-mini and 4.1-nano. The family features a 1M token context window, in contrast to o3's 200k, and is aimed at developers building on long-context and coding workloads.

#frontier-models #architecture #coding

OpenAI Apr 17, 2025

Benchmarks & EvalsOpen weights

MRCR

OpenAI open sources the MRCR long-context benchmark dataset

OpenAI open sourced MRCR, a benchmark dataset for evaluating long-context, complex retrieval tasks, building on Gemini research from Google and publishing the dataset on Hugging Face.

Hugging Face ↗

#benchmarks #architecture

Meta AI Apr 10, 2025

New ModelsOpen weights

Llama 4 (Scout & Maverick)

Meta drops Llama 4 Scout (109B) and Maverick (400B) open-weights MoE models

Meta released the long-awaited Llama 4 family in a chaotic Saturday drop: Scout (17B active / ~109B total, 16 experts) and Maverick (17B active / ~400B total, 128 experts), with a 2T-parameter Behemoth still in training. The models are multimodal, multilingual MoE architectures trained on ~30T tokens with FP8 and interleaved attention (iRoPE), claiming 10M context for Scout and 1M for Maverick. The release was marred by drama: the LMArena version differed from the released model, and the community criticized the lack of small local-friendly sizes.

10M Stated context window for Llama 4 Scout288B Active parameters of unreleased Behemoth (2T total)17B Active parameters for both Scout and Maverick

Meta blog: Llama 4 multimodal intelligence ↗Hugging Face: meta-llama ↗Try it at meta.ai ↗

#open-source #architecture #multimodal

H HKU NLP (University of Hong Kong) Apr 3, 2025

New Models

Dream 7B

Dream 7B: a diffusion language model challenger unveiled

Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.

Dream 7B blog post ↗Benchmark results thread (Sudoku) ↗

#architecture #research #reasoning

March 2025

Google DeepMind Mar 27, 2025

New Models

Gemini 2.5 Pro

Google reclaims #1 with Gemini 2.5 Pro thinking model

Google dropped Gemini 2.5 Pro, a thinking model that took the #1 spot as the best all-around LLM available, with massive jumps on benchmarks like AIME (up nearly 20 points) and GPQA. It inherits native multimodality and a 1M token context window, maintaining high accuracy even at 120k+ tokens on needle-in-a-haystack tests, with surprisingly low latency (~13 seconds on hard reasoning questions vs 45+ for others). Tulsee Doshi, head of product for Gemini models, joined the show to give the inside scoop.

20 point jump on AIME benchmark1M token context window13 seconds latency on hard reasoning questions (vs 45+ for others)

X announcement (Jeff Dean) ↗Official blog post ↗Try it at ai.dev ↗

#reasoning #architecture #frontier-models

Reve Mar 27, 2025

New Models

Reve Image

Reve emerges with SOTA diffusion image generation claims

Reve launched a new diffusion image generation model claiming state-of-the-art quality, reportedly beating heavyweights like Midjourney and Flux at roughly a penny per image. The previously low-profile lab made a splash with strong prompt adherence and image quality.

X announcement (Taesung) ↗Decrypt coverage ↗

Announcement (X) ↗Hugging Face ↗

AI21 Labs Mar 6, 2025

New ModelsOpen weights

Jamba 1.6 Large & Mini

AI21 releases Jamba 1.6 Large and Jamba 1.6 Mini open-weights models

AI21 Labs released Jamba 1.6 in Large and Mini sizes, updating its hybrid SSM-Transformer (Mamba-based) model family with open weights on Hugging Face. The Jamba architecture targets long-context efficiency compared to pure transformer models.

February 2025

Inception Labs Feb 27, 2025

New Models

Mercury

Inception Labs debuts Mercury, a commercial diffusion LLM

Inception Labs announced Mercury, billed as the first commercial-scale diffusion large language model, generating text via diffusion rather than autoregressive decoding. The approach promises dramatically faster token throughput, demoed first with the Mercury Coder playground.

X ↗Try it ↗

#architecture #coding #infrastructure

Arc Institute & NVIDIA Feb 20, 2025

New ModelsOpen weights

Evo 2

Arc Institute and NVIDIA release Evo 2, a 40B state-of-the-art genomics model

Arc Institute and NVIDIA introduced Evo 2, a state-of-the-art genomics model with around 40 billion parameters trained on 9.3 trillion nucleotides. It uses the StripedHyena architecture to process genetic sequences up to 1 million nucleotides, enabling prediction of genetic mutation effects and even design of entire genomes. Fully open: two papers, weights, data, and training and inference codebases.

Announcement on X ↗

#research #open-source #architecture

January 2025

Google DeepMind Jan 23, 2025

New Models

Gemini 2.0 Flash Thinking 01-21

Google ships updated Gemini Flash Thinking with 1M context

Google released an updated Gemini Flash Thinking model (01-21) with a 1 million token context window, built-in code execution, and improved evals over the previous Thinking release. It pushes Google's reasoning-model line forward in the same week DeepSeek R1 landed.

1M Context window (tokens)

Noam Shazeer announcement on X ↗