DeepSeek

12 releases covered on ThursdAI · deepseek.com ↗

April 2026

DeepSeek Apr 30, 2026

New ModelsOpen weights

DeepSeek V4

DeepSeek V4: 1.6T MoE with CSA+HCA attention and 1M context

DeepSeek released the V4 paper and models (V4-Pro and V4-Flash on Hugging Face), a 1.6T-parameter MoE featuring CSA+HCA attention that fits 1M tokens of context in just 5.7GB of KV cache. It is possibly the first frontier model trained across multiple datacenters, and DeepSeek is offering API tokens at an 80% discount on already much cheaper pricing.

1M context window5.7GB KV cache at 1M context

DeepSeek announcement on X ↗Arxiv paper ↗DeepSeek-V4-Pro on Hugging Face ↗DeepSeek-V4-Flash on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #architecture #training

December 2025

DeepSeek Dec 25, 2025

New ModelsOpen weights

DeepSeek R1

DeepSeek R1: the open reasoning model that crashed NVIDIA's stock

DeepSeek's open-weights reasoning model dropped January 23rd and matched OpenAI's o1 at roughly 50x cheaper pricing, with an alleged training cost of just $5.5M. It crashed NVIDIA stock 17% — a $560B single-day loss, the largest single-company monetary loss in history — and made Chinese AI a household topic. The crew named it the earthquake that shattered assumptions about who leads AI.

$560B NVIDIA stock loss$5.5M DeepSeek R1 training cost

Jan 24 Episode ↗Jan 30 Episode ↗

🎙️ Hear our coverage →

#open-source #reasoning

DeepSeek Dec 25, 2025

New ModelsOpen weights

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus lands amid September's relentless pace

DeepSeek resurfaced in September with V3.1 Terminus, another strong open-weights release that arrived just as the crew was barely keeping up with the weekly firehose. Nisten noted that missing a single week in this period left you completely lost.

🎙️ Hear our coverage →

#open-source #reasoning

DeepSeek Dec 4, 2025

New ModelsOpen weights

DeepSeek V3.2 / V3.2-Speciale

DeepSeek V3.2 and V3.2-Speciale post gold-medal reasoning under MIT license

DeepSeek released V3.2 and the reasoning-first V3.2-Speciale, a 685B-parameter MoE under MIT license. Speciale posted gold-medal-level olympiad results and 96% on AIME (versus GPT-5 High at 94%), with V3.2 hitting 73.1% on SWE-Bench Verified. Aggressive pricing around 28 cents per 1M tokens on OpenRouter pushes open models closer to top closed-model capability.

96% AIME73.1% SWE-Bench Verified685B Total parameters (MoE)

DeepSeek V3.2 (Hugging Face) ↗DeepSeek V3.2-Speciale (Hugging Face) ↗DeepSeek V3.2 announcement ↗DeepSeek announcement on X ↗

🎙️ Hear our coverage →

#open-source #reasoning #coding

November 2025

DeepSeek Nov 27, 2025

New ModelsOpen weights

DeepSeek Math V2

DeepSeek Math V2: 685B open-weights model with IMO gold-level math

DeepSeek surfaced DeepSeek Math V2, a 685B-parameter Apache-2.0 model that reaches IMO gold-level math reasoning. It is the first open-weights math champion at this level, dropped quietly on HuggingFace during the week.

685B Parameters

DeepSeek Math V2 on HuggingFace ↗

🎙️ Hear our coverage →

#open-source #reasoning

October 2025

DeepSeek Oct 23, 2025

New ModelsOpen weights

DeepSeek-OCR

DeepSeek-OCR turns text into compressed vision tokens for massive contexts

DeepSeek open-sourced DeepSeek-OCR, a 3B model (~570M active parameters) that is less an OCR model and more a context-compression breakthrough: it renders text as images, compresses it up to 10x while retaining 97% decoding accuracy (60% even at 20x), and reads it back with a tiny vision decoder. The approach suggests text tokenization is far from optimal and points at vastly cheaper long-context processing; alphaXiv reportedly OCR'd all of arXiv for $1000 versus $7500 with MistralOCR, and a single H100 can process up to 200K pages.

97% decoding accuracy at 10x compression~570M active parameters (3B total)200K pages scannable on a single H100

X ↗HF ↗Paper ↗

🎙️ Hear our coverage →

#vision #open-source #search

September 2025

DeepSeek Sep 25, 2025

New ModelsOpen weights

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus refines agents and bilingual output

DeepSeek released V3.1 Terminus, an update to V3.1 with cleaner bilingual output, stronger agentic tool use, and cheaper long-context handling. The open weights are available on Hugging Face, continuing DeepSeek's cadence of iterative open releases.

X ↗HF ↗

🎙️ Hear our coverage →

#open-source #agents #reasoning

May 2025

DeepSeek May 29, 2025

New ModelsOpen weights

DeepSeek-R1-0528

DeepSeek drops R1-0528, an updated open reasoning model with big gains

DeepSeek released R1-0528 out of nowhere, an update to their open-weights reasoning model with serious performance jumps: AIME 91, LiveCodeBench 73, and SWE-bench Verified 57.6. They also shipped an 8B distilled version based on Qwen3 that can run on a laptop, keeping it among the best open-weight models available.

91 AIME score, beating previous R1 by a mile8B Distilled Qwen3-based version runnable on a laptop

Try It ↗

🎙️ Hear our coverage →

#open-source #reasoning

March 2025

DeepSeek Mar 27, 2025

New ModelsOpen weights

DeepSeek-V3-0324

DeepSeek silently drops V3-0324, 685B params under MIT license

DeepSeek silently updated their V3 base model with DeepSeek-V3-0324, a 685B parameter MoE released on Hugging Face under the MIT license. This is not R1 (their reasoning model) but the powerful base model R1 was built on, and supposedly the base for a future R2.

685B parameters

X announcement ↗Hugging Face ↗

🎙️ Hear our coverage →

#open-source #frontier-models

February 2025

DeepSeek Feb 27, 2025

Dev ToolsOpen weights

Open Source Week infra releases

DeepSeek open-sources its infra stack during Open Source Week

DeepSeek ran its Open Source Week, releasing a series of production infrastructure repos (including FlashMLA, DeepEP, and DeepGEMM) that power its training and inference stack. The drops gave the open-source community a rare look at the low-level kernels and communication libraries behind DeepSeek's efficient frontier models.

X account ↗

🎙️ Hear our coverage →

#open-source #infrastructure

January 2025

DeepSeek Jan 30, 2025

New ModelsOpen weights

Janus Pro

DeepSeek Janus Pro: open multimodal models in 1.5B and 7B

Amid the R1 frenzy, DeepSeek also released Janus Pro, unified multimodal models at 1.5B and 7B parameters that handle both image understanding and image generation. The open release added to DeepSeek's week of dominating AI news headlines.

1.5B / 7B Model sizes

GitHub ↗Try it (HF Space) ↗

🎙️ Hear our coverage →

#open-source #image-gen #multimodal

DeepSeek Jan 23, 2025

New ModelsOpen weights

DeepSeek R1

DeepSeek R1: MIT-licensed open source reasoning model rivals o1

DeepSeek released R1, a state-of-the-art open source reasoning model under a permissive MIT license. It matches or beats OpenAI's o1 on key reasoning benchmarks while being fully open weights, and DeepSeek also shipped a family of distilled smaller models. The show called this the hottest week open source AI has ever had.

DeepSeek on Hugging Face ↗Combine DeepSeek R1 reasoning with GPT-3.5 Turbo (egghead) ↗Run DeepSeek with more thinking (Gist) ↗

🎙️ Hear our coverage →

#open-source #reasoning

DeepSeek

DeepSeek V4

DeepSeek R1

DeepSeek V3.1 Terminus

DeepSeek V3.2 / V3.2-Speciale

DeepSeek Math V2

DeepSeek-OCR

DeepSeek V3.1 Terminus

DeepSeek-R1-0528

DeepSeek-V3-0324

Open Source Week infra releases

Janus Pro

DeepSeek R1

Get this every week