Training & Post-Training

Fine-tuning, reinforcement learning, reward models, datasets, and training methods. — 37 releases covered on the show.

July 2026

P PyTorch Jul 8, 2026

Dev ToolsOpen weights

PyTorch 2.13

PyTorch 2.13 lands FlexAttention on Apple Silicon and big memory wins

3,328 commits from 526 contributors: FlexAttention on Apple Silicon at roughly 12x over SDPA for sparse patterns, a deterministic CUDA backward path, nn.LinearCrossEntropyLoss with up to 4x peak-memory reduction, torchcomms for large-cluster training, and expanded ROCm/Arm/XPU support.

~12x FlexAttention on Apple Silicon vs SDPA3,328 Commits from 526 contributors

X announcement ↗

🎙️ Hear our coverage →

#open-source #training #infrastructure

Liquid AI Jul 7, 2026

Papers & ResearchOpen weights

Antidoom

Liquid AI open-sources Antidoom, removing the reasoning doom-loop

An open method that suppresses the failure mode where reasoning models spiral into repetitive degenerate output: doom-loop rates dropped from 22.9% to 1% on Qwen3.5-4B and from 10.2% to 1.4% on an LFM2.5 checkpoint, with eval scores improving across the board.

22.9%→1% Doom-loop rate, Qwen3.5-4B

X announcement ↗

🎙️ Hear our coverage →

#reasoning #open-source #training

May 2026

Nous Research May 14, 2026

Papers & ResearchOpen weights

TST (Token Superposition Training)

Nous Research TST: 2-3x training speedup without architecture changes

Nous Research released Token Superposition Training (TST), a training technique that achieves 2-3x wall-clock speedup at matched FLOPs. It requires no architecture changes, making it a drop-in efficiency win for LLM training runs.

X announcement ↗

🎙️ Hear our coverage →

#research #training

April 2026

Baidu Apr 30, 2026

New Models

ERNIE 5.1 Preview

Baidu ERNIE 5.1 Preview hits #13 on Arena with 6% of the compute

Baidu's ERNIE 5.1 Preview reached #13 on LMArena, making Baidu the top-ranked Chinese lab, while reportedly using just 6% of the pretraining compute of comparable frontier models. The model is available at ernie.baidu.com.

ernie.baidu.com ↗ERNIE for Devs on X ↗Arena announcement ↗

🎙️ Hear our coverage →

#frontier-models #training #benchmarks

DeepSeek Apr 30, 2026

New ModelsOpen weights

DeepSeek V4

DeepSeek V4: 1.6T MoE with CSA+HCA attention and 1M context

DeepSeek released the V4 paper and models (V4-Pro and V4-Flash on Hugging Face), a 1.6T-parameter MoE featuring CSA+HCA attention that fits 1M tokens of context in just 5.7GB of KV cache. It is possibly the first frontier model trained across multiple datacenters, and DeepSeek is offering API tokens at an 80% discount on already much cheaper pricing.

1M context window5.7GB KV cache at 1M context

DeepSeek announcement on X ↗Arxiv paper ↗DeepSeek-V4-Pro on Hugging Face ↗DeepSeek-V4-Flash on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #architecture #training

OpenAI Apr 30, 2026

Papers & Research

Where the Goblins Came From (blog post)

OpenAI publishes postmortem on GPT-5.5's 'goblin mode'

OpenAI published a research blog explaining GPT-5.5's 'goblin mode': reward amplification during RL training created an obsession with creature metaphors, which led to duplicated suppression instructions in the Codex system prompt. The leaked GPT-5.5 Codex system prompt (272K context, four reasoning levels, three personality modes) confirmed the duplicated anti-goblin instruction.

OpenAI blog: Where the goblins came from ↗

🎙️ Hear our coverage →

#safety #training

March 2026

MiniMax Mar 19, 2026

New Models

MiniMax M2.7

MiniMax M2.7: first self-evolving model hits 56% on SWE-Bench Pro

MiniMax dropped M2.7, billed as the first self-evolving model: it ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding, built by one engineer over four days with zero lines of human code. It scores 56.22% on SWE-Bench Pro, within one point of Opus 4.6's 57.3%, and WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw agent tasks. Not yet open weights, though rumors suggest a release is coming.

56% MiniMax 2.7 SWE-bench Pro

MiniMax announcement ↗MiniMax on X ↗TestingCatalog on X ↗MiniMax M2.7 announcement (X) ↗

🎙️ Hear our coverage (+1 follow-up) →

#coding #agents #reasoning

Unsloth AI Mar 19, 2026

Dev ToolsOpen weights

Unsloth Studio

Unsloth Studio: web UI for local fine-tuning with 2x speed, 70% less VRAM

Unsloth launched Studio, an open-source web UI for local LLM training and inference claiming 2x speed and 70% less VRAM, supporting 500+ models across text, vision, audio, and embeddings. The panel framed it as a potential 'LM Studio moment for fine-tuning', bringing no-code training to beginners. Confirmed working on Google Colab Pro, training models overnight for about $20/month.

Unsloth Studio docs ↗X announcement ↗GitHub ↗Daniel Han announcement (X) ↗

🎙️ Hear our coverage (+1 follow-up) →

#training #open-source #coding

T Templar Mar 13, 2026

New ModelsOpen weights

Covenant-72B

Covenant-72B: a decentralized-trained open 72B LLM

Covenant-72B is a decentralized 72B-parameter open LLM, released and shared via Hugging Face. It was highlighted in the open-source segment as an example of decentralized model training.

Covenant-72B on X ↗Covenant-72B on HuggingFace ↗

🎙️ Hear our coverage →

#open-source #training

January 2026

Nous Research Jan 8, 2026

New ModelsOpen weights

NousCoder 14B

NousCoder 14B: 7% LiveCodeBench jump in 4 days of RL training

Nous Research released NousCoder 14B, an open source competitive programming model that achieved a 7% jump on LiveCodeBench accuracy in just four days of RL training on 48 NVIDIA B200 GPUs. Training used 24,000 verifiable problems, and the release ships under a full Apache 2 license with training code and a benchmark harness.

NousCoder 14B on X ↗NousCoder W&B Dashboard ↗NousCoder Atropos on GitHub ↗

🎙️ Hear our coverage →

#open-source #coding #training

December 2025

Nous Research Dec 4, 2025

New ModelsOpen weights

Hermes 4.3

Nous Research ships Hermes 4.3 36B with decentralized training

Nous Research released Hermes 4.3-36B, highlighted on the show for being trained with decentralized infrastructure and for state-of-the-art RefusalBench performance. The release continues the Hermes line of open, steerable instruction-tuned models.

Hermes 4.3-36B (Hugging Face) ↗Nous Research on X ↗

🎙️ Hear our coverage →

#open-source #training

November 2025

Weights & Biases Nov 27, 2025

Products & Apps

Serverless LoRA Inference

W&B launches Serverless LoRA Inference on CoreWeave

Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.

W&B Serverless LoRA Report ↗W&B LoRA Notebook ↗W&B Announcement on X ↗

🎙️ Hear our coverage →

#infrastructure #training #coding

I Inference.net Nov 13, 2025

DatasetsOpen weights

Project AELLA (OSSAS)

Project AELLA publishes 100K LLM-generated research paper summaries

Project AELLA (also called OSSAS) released 100,000 LLM-generated structured summaries of scientific papers, published openly on Hugging Face. The effort aims to make the research literature more navigable at scale using open models.

Sam Hogan announcement on X ↗Inference.net on Hugging Face ↗

🎙️ Hear our coverage →

#training #research

Hugging Face Nov 6, 2025

Also ReleasedOpen weights

Smol Training Playbook

Hugging Face publishes the Smol Training Playbook for LLM pretraining

Hugging Face published the Smol Training Playbook, a 200+ page end-to-end guide to reliably pretraining and operating LLMs. It distills the team's practical experience from the SmolLM line into an open resource for anyone training their own models.

X ↗Announcement ↗

🎙️ Hear our coverage →

#open-source #training

October 2025

Meta AI (PyTorch) Oct 23, 2025

Dev ToolsOpen weights

TorchForge

TorchForge: PyTorch-native library for scalable RL post-training

Meta's PyTorch team, in collaboration with Weights & Biases/CoreWeave and Stanford, introduced TorchForge, a PyTorch-native library for scalable reinforcement-learning post-training and agent development. Built for massive GPU runs (W&B/CoreWeave provided 520 H100s) and competing with Ray via tools like the Monarch scheduler.

520 H100s provided for development runs

🎙️ Hear our coverage →

#training #coding

Cognition Oct 16, 2025

New Models

SWE-grep

Cognition SWE-grep: RL-trained fast context retrieval for coding agents

Cognition released SWE-grep, an RL-trained multi-turn context retriever that finds relevant code for agentic coding tasks far faster than full agent loops. It powers fast context retrieval in Cognition's products, and a public playground lets developers try it on real repos.

Blog ↗X announcement ↗Playground ↗

🎙️ Hear our coverage →

#coding #agents #training

OpenPipe (Weights & Biases) Oct 16, 2025

New Models

OpenPipe Qwen3 14B Instruct

OpenPipe Qwen3 14B Instruct lands on W&B Inference

OpenPipe, now part of Weights & Biases / CoreWeave, released a Qwen3 14B instruct model available through W&B Inference. Co-founder Kyle Corbitt joined the show to talk RL, Serverless RL, and practical agent evaluation and deployment.

W&B Inference model page ↗

🎙️ Hear our coverage →

#training #infrastructure

September 2025

Weights & Biases Sep 18, 2025

Major Features & Updates

Weave in W&B Workspaces

W&B brings Weave traces into Models workspaces for RL runs

Weights & Biases shipped Weave inside W&B Models workspaces, so reinforcement learning runs can now be logged and inspected with Weave trace tooling alongside training metrics. The show frames it as giving RL training 'x-ray vision' into what the model is actually doing.

X ↗W&B Docs ↗

🎙️ Hear our coverage →

#infrastructure #coding #training

CoreWeave Sep 4, 2025

Acquisitions

OpenPipe Acquisition

CoreWeave acquires OpenPipe to expand its AI training stack

CoreWeave acquired OpenPipe, the fine-tuning and reinforcement-learning platform behind the ART trainer. Covered in the This Week's Buzz segment, the deal brings OpenPipe's model-customization tooling under the same roof as CoreWeave's GPU cloud and Weights & Biases.

🎙️ Hear our coverage →

#industry #training

July 2025

Agentica Jul 3, 2025

New ModelsOpen weights

DeepSWE-Preview

DeepSWE-Preview hits 59% SWE-Bench Verified with pure RL on Qwen3-32B

Agentica and collaborators (with guest Michael Luo of UC Berkeley) released DeepSWE-Preview, a fully open-sourced RL-trained coding agent built on Qwen3-32B that reached 59% on SWE-Bench Verified, a top open result in a benchmark dominated by closed systems. The team published training methodology and weights, emphasizing reproducible reward design and verification over sealed benchmark numbers.

59% SWE-Bench Verified

Training write-up (Notion) ↗Hugging Face model ↗

🎙️ Hear our coverage →

#open-source #coding #agents

May 2025

Haize Labs May 29, 2025

New ModelsOpen weights

j1-nano & j1-micro

Haize Labs releases j1-nano and j1-micro tiny reward models

Haize Labs shipped j1-nano (600M params) and j1-micro (1.7B params), tiny open reward models for judging LLM outputs. Despite their small size, j1-micro scores 80.7% on RewardBench, making capable reward modeling accessible on modest hardware.

Tweet ↗GitHub ↗HF j1-micro ↗HF j1-nano ↗

🎙️ Hear our coverage →

#open-source #training #benchmarks

UC Berkeley May 29, 2025

Papers & Research

Intuitor (Learning to Reason Without External Rewards)

Paper: models can learn to reason without external rewards

A mind-bending paper showing that reinforcement learning with internal or even random rewards can improve reasoning models. Intuitor matched or exceeded some GRPO results (the external-reward framework DeepSeek popularized with R1) when finetuning Qwen2.5 3B, questioning how much of RL's gains come from the reward signal itself.

3B Qwen2.5 model size where Intuitor matched or exceeded GRPO results

X announcement ↗

🎙️ Hear our coverage →

#reasoning #training #research

Nous Research May 15, 2025

Products & AppsOpen weights

Psyche

Nous Research launches Psyche, a decentralized cooperative-training network

Psyche is Nous Research's decentralized cooperative-training network that lets distributed participants jointly train large models over the internet. The launch includes open code on GitHub and a live dashboard tracking the first run, a 40B model called Consilience. COO Dillon Rolnick joined the show to explain the decentralized training push.

Website ↗GitHub ↗Announcement tweet ↗Consilience 40B dashboard ↗

🎙️ Hear our coverage →

#training #open-source #infrastructure

StepFun May 15, 2025

New ModelsOpen weights

Step1X-3D

StepFun's Step1X-3D: open two-stage framework for textured 3D assets

StepFun released Step1X-3D, an open two-stage framework for high-fidelity, controllable generation of textured 3D assets: it first synthesizes watertight geometry, then generates view-consistent textures. Trained on 2M curated meshes, the release also includes a curated dataset of 800K assets and a Hugging Face demo.

Hugging Face ↗Demo ↗Dataset ↗

🎙️ Hear our coverage →

#world-models #open-source #training

OpenPipe May 1, 2025

New ModelsOpen weights

ART·E

OpenPipe's ART·E: RL-trained open email agent that beats o3

OpenPipe released ART·E, an Apache 2.0 email research agent built on a 14B Qwen 2.5 backbone, trained on 500K Enron emails plus synthetic Q&A and refined with reinforcement learning. It tops o3 on accuracy (96% vs 90%) while running 5x faster (1.1s median) and 64x cheaper ($0.85 per 1,000 queries), using a simple three-tool loop.

Launch thread (X) ↗Blog post ↗GitHub: OpenPipe/ART ↗

🎙️ Hear our coverage →

#agents #training #open-source

UC Berkeley May 1, 2025

DatasetsOpen weights

PromptEvals

PromptEvals: 12K+ real production assertion criteria for LLM evals

Shreya Shankar and collaborators released PromptEvals, the first large-scale corpus of production LLM guardrails: 2,087 developer prompts paired with 12,623 assertion criteria covering structure, style, grounding and hallucination checks, about 5x larger than prior sets. Fine-tuned open Mistral-7B and Llama-3-8B checkpoints generate assertions +21 F1 better than GPT-4o at a fraction of the latency. Accepted to NAACL 2025.

NAACL paper (ArXiv) ↗Dataset (Hugging Face) ↗Models (Hugging Face) ↗

🎙️ Hear our coverage →

#benchmarks #training #coding

Xiaomi May 1, 2025

New ModelsOpen weights

MiMo-7B

Xiaomi enters open weights with MiMo-7B, MIT-licensed reasoning family

Xiaomi's first open-weights release is a 7B dense family (Base, SFT, RL, RL-Zero) trained from scratch on 25T tokens with a multi-token-prediction objective and rule-verifiable reinforcement learning. The RL variant matches OpenAI o1-mini on benchmark suites despite being far smaller, scoring 55.4% on AIME 2025 and 49.3% on LiveCodeBench v6, all under an MIT license with vLLM-ready weights.

Hugging Face model hub ↗

🎙️ Hear our coverage →

#open-source #reasoning #training

April 2025

Prime Intellect Apr 17, 2025

New ModelsOpen weights

INTELLECT-2

Prime Intellect launches INTELLECT-2, a 32B globally-distributed RL run

Prime Intellect released INTELLECT-2, a 32B reasoning model trained with globally decentralized reinforcement learning, a follow-up to the INTELLECT-1 decentralized pretraining run covered on the show in December. The release includes open weights on Hugging Face, a tech report, and the PRIME-RL training code.

Blog ↗X ↗Blog ↗Tech report ↗

🎙️ Hear our coverage (+1 follow-up) →

#open-source #training #reasoning

NVIDIA Apr 10, 2025

New ModelsOpen weights

Llama-3.1-Nemotron-Ultra-253B

NVIDIA ships Nemotron Ultra, a 253B pruned and distilled Llama 3.1-405B

NVIDIA released Nemotron Ultra, a pruned and distilled finetune of Llama 3.1-405B at roughly half the parameters (253B). Its benchmarks even included Llama 4 comparisons, showing the older finetuned Llama beating the new models on AIME, GPQA and more. It supports 128K context and fits on a single 8xH100 node for inference.

253B Parameters (pruned from Llama 3.1-405B)128K Context window

Hugging Face: Llama-3_1-Nemotron-Ultra-253B-v1 ↗Announcement on X ↗

🎙️ Hear our coverage →

#open-source #training #reasoning

Stanford / NVIDIA / UCSD / UC Berkeley Apr 10, 2025

Papers & ResearchOpen weights

One-Minute Video Generation with Test-Time Training

Test-Time Training paper one-shots minute-long videos with consistent characters

Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.

1 min Single-shot generated video length

Project blog ↗Paper ↗

🎙️ Hear our coverage →

#video-gen #research #training

Together AI & Agentica (UC Berkeley) Apr 10, 2025

New ModelsOpen weights

DeepCoder-14B-Preview

DeepCoder-14B: open RL-finetuned coder beats DeepSeek R1 and o3-mini on coding

Together AI and Agentica (UC Berkeley Sky Computing Lab) released DeepCoder-14B-Preview, a reasoning model finetuned with RL that beats DeepSeek R1 and even o3-mini on several coding benchmarks. The project aims to democratize RL: the team open-sourced the model, the training dataset, the Weights & Biases logs, and the eval logs. Guest Michael Luo from Agentica joined the show to discuss the release.

14B Model parameters

Together AI blog: DeepCoder ↗Announcement on X ↗Hugging Face: DeepCoder-14B-Preview ↗Hugging Face dataset: DeepCoder-Preview-Dataset ↗

🎙️ Hear our coverage →

#open-source #coding #reasoning

March 2025

ByteDance Mar 20, 2025

Papers & ResearchOpen weights

DAPO

ByteDance releases DAPO, an RL method that beats GRPO

ByteDance published DAPO, a reinforcement learning method for LLM post-training presented as an improvement over GRPO. The paper ships with an open GitHub implementation, making the technique reproducible for the open-source RL community.

X thread ↗Github ↗Paper ↗

🎙️ Hear our coverage →

#training #reasoning #research

NVIDIA Mar 20, 2025

New ModelsOpen weights

Llama-Nemotron (Super 49B, Nano 8B)

NVIDIA drops Llama-Nemotron reasoning models plus training dataset

NVIDIA released the Llama-Nemotron family, including Super 49B and Nano 8B reasoning models, announced around GTC. Alongside the open weights, NVIDIA published the Llama-Nemotron post-training dataset, giving the community both the models and the data recipe behind them.

Announcement ↗X ↗Llama-Nemotron HuggingFace Collection ↗Dataset ↗

🎙️ Hear our coverage →

#open-source #reasoning #training

February 2025

Hugging Face Feb 20, 2025

Also ReleasedOpen weights

Ultra Scale Playbook

Hugging Face publishes the Ultra Scale Playbook for training on GPU clusters

Hugging Face released the Ultra Scale Playbook, a guide to building and scaling AI models on large GPU clusters. The team ran 4,000 scaling experiments on up to 512 GPUs to distill practical guidance for labs training big models.

Hugging Face ↗

🎙️ Hear our coverage →

#training #infrastructure #open-source

January 2025

Allen Institute for AI (Ai2) Jan 30, 2025

New ModelsOpen weights

Tulu 3 405B

Allen Institute releases Tulu 3 405B open post-trained model

The Allen Institute for AI scaled its fully open Tulu 3 post-training recipe to a 405B-parameter model based on Llama 3.1 405B. It demonstrates that Ai2's open RLVR post-training pipeline works at frontier scale, with weights and recipe released openly.

405B Parameters

Blog ↗Hugging Face collection ↗

🎙️ Hear our coverage →

#open-source #training

O Open Thoughts Jan 30, 2025

DatasetsOpen weights

OpenThoughts-114k

Open Thoughts releases OpenThoughts-114k reasoning dataset

An open reasoning dataset with 114k examples released by the Open Thoughts project to fuel open replication of reasoning models like DeepSeek R1. It gives the open-source community high-quality chain-of-thought training data for distilling and fine-tuning reasoning LLMs.

X announcement ↗Hugging Face ↗

🎙️ Hear our coverage →

#open-source #reasoning #training

UC Berkeley Jan 30, 2025

Papers & ResearchOpen weights

TinyZero & RAGEN

Berkeley TinyZero and RAGEN replicate DeepSeek R1-Zero

Berkeley researchers released TinyZero and RAGEN, open replications of DeepSeek's R1-Zero reinforcement-learning recipe on small models. The projects showed that R1-style emergent reasoning behavior can be reproduced cheaply, with training runs logged publicly on Weights & Biases.

GitHub ↗W&B logs ↗

🎙️ Hear our coverage →

#reasoning #training #open-source