DeepSeek

12 releases covered on ThursdAI · deepseek.com ↗

April 2026

DeepSeek
New ModelsOpen weights

DeepSeek V4

DeepSeek V4: 1.6T MoE with CSA+HCA attention and 1M context

DeepSeek released the V4 paper and models (V4-Pro and V4-Flash on Hugging Face), a 1.6T-parameter MoE featuring CSA+HCA attention that fits 1M tokens of context in just 5.7GB of KV cache. It is possibly the first frontier model trained across multiple datacenters, and DeepSeek is offering API tokens at an 80% discount on already much cheaper pricing.

1M context window5.7GB KV cache at 1M context

December 2025

DeepSeek
New ModelsOpen weights

DeepSeek R1

DeepSeek R1: the open reasoning model that crashed NVIDIA's stock

DeepSeek's open-weights reasoning model dropped January 23rd and matched OpenAI's o1 at roughly 50x cheaper pricing, with an alleged training cost of just $5.5M. It crashed NVIDIA stock 17% — a $560B single-day loss, the largest single-company monetary loss in history — and made Chinese AI a household topic. The crew named it the earthquake that shattered assumptions about who leads AI.

$560B NVIDIA stock loss$5.5M DeepSeek R1 training cost
DeepSeek
New ModelsOpen weights

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus lands amid September's relentless pace

DeepSeek resurfaced in September with V3.1 Terminus, another strong open-weights release that arrived just as the crew was barely keeping up with the weekly firehose. Nisten noted that missing a single week in this period left you completely lost.

DeepSeek
New ModelsOpen weights

DeepSeek V3.2 / V3.2-Speciale

DeepSeek V3.2 and V3.2-Speciale post gold-medal reasoning under MIT license

DeepSeek released V3.2 and the reasoning-first V3.2-Speciale, a 685B-parameter MoE under MIT license. Speciale posted gold-medal-level olympiad results and 96% on AIME (versus GPT-5 High at 94%), with V3.2 hitting 73.1% on SWE-Bench Verified. Aggressive pricing around 28 cents per 1M tokens on OpenRouter pushes open models closer to top closed-model capability.

96% AIME73.1% SWE-Bench Verified685B Total parameters (MoE)

November 2025

October 2025

DeepSeek
New ModelsOpen weights

DeepSeek-OCR

DeepSeek-OCR turns text into compressed vision tokens for massive contexts

DeepSeek open-sourced DeepSeek-OCR, a 3B model (~570M active parameters) that is less an OCR model and more a context-compression breakthrough: it renders text as images, compresses it up to 10x while retaining 97% decoding accuracy (60% even at 20x), and reads it back with a tiny vision decoder. The approach suggests text tokenization is far from optimal and points at vastly cheaper long-context processing; alphaXiv reportedly OCR'd all of arXiv for $1000 versus $7500 with MistralOCR, and a single H100 can process up to 200K pages.

97% decoding accuracy at 10x compression~570M active parameters (3B total)200K pages scannable on a single H100

September 2025

DeepSeek
New ModelsOpen weights

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus refines agents and bilingual output

DeepSeek released V3.1 Terminus, an update to V3.1 with cleaner bilingual output, stronger agentic tool use, and cheaper long-context handling. The open weights are available on Hugging Face, continuing DeepSeek's cadence of iterative open releases.

May 2025

DeepSeek
New ModelsOpen weights

DeepSeek-R1-0528

DeepSeek drops R1-0528, an updated open reasoning model with big gains

DeepSeek released R1-0528 out of nowhere, an update to their open-weights reasoning model with serious performance jumps: AIME 91, LiveCodeBench 73, and SWE-bench Verified 57.6. They also shipped an 8B distilled version based on Qwen3 that can run on a laptop, keeping it among the best open-weight models available.

91 AIME score, beating previous R1 by a mile8B Distilled Qwen3-based version runnable on a laptop

March 2025

DeepSeek
New ModelsOpen weights

DeepSeek-V3-0324

DeepSeek silently drops V3-0324, 685B params under MIT license

DeepSeek silently updated their V3 base model with DeepSeek-V3-0324, a 685B parameter MoE released on Hugging Face under the MIT license. This is not R1 (their reasoning model) but the powerful base model R1 was built on, and supposedly the base for a future R2.

685B parameters

February 2025

DeepSeek
Dev ToolsOpen weights

Open Source Week infra releases

DeepSeek open-sources its infra stack during Open Source Week

DeepSeek ran its Open Source Week, releasing a series of production infrastructure repos (including FlashMLA, DeepEP, and DeepGEMM) that power its training and inference stack. The drops gave the open-source community a rare look at the low-level kernels and communication libraries behind DeepSeek's efficient frontier models.

January 2025

DeepSeek
New ModelsOpen weights

DeepSeek R1

DeepSeek R1: MIT-licensed open source reasoning model rivals o1

DeepSeek released R1, a state-of-the-art open source reasoning model under a permissive MIT license. It matches or beats OpenAI's o1 on key reasoning benchmarks while being fully open weights, and DeepSeek also shipped a family of distilled smaller models. The show called this the hottest week open source AI has ever had.