New ModelsOpen weights
j1-nano & j1-micro
Haize Labs releases j1-nano and j1-micro tiny reward models
Haize Labs shipped j1-nano (600M params) and j1-micro (1.7B params), tiny open reward models for judging LLM outputs. Despite their small size, j1-micro scores 80.7% on RewardBench, making capable reward modeling accessible on modest hardware.
Papers & Research
Intuitor (Learning to Reason Without External Rewards)
Paper: models can learn to reason without external rewards
A mind-bending paper showing that reinforcement learning with internal or even random rewards can improve reasoning models. Intuitor matched or exceeded some GRPO results (the external-reward framework DeepSeek popularized with R1) when finetuning Qwen2.5 3B, questioning how much of RL's gains come from the reward signal itself.
3B Qwen2.5 model size where Intuitor matched or exceeded GRPO results
Products & AppsOpen weights
Psyche
Nous Research launches Psyche, a decentralized cooperative-training network
Psyche is Nous Research's decentralized cooperative-training network that lets distributed participants jointly train large models over the internet. The launch includes open code on GitHub and a live dashboard tracking the first run, a 40B model called Consilience. COO Dillon Rolnick joined the show to explain the decentralized training push.
New ModelsOpen weights
Step1X-3D
StepFun's Step1X-3D: open two-stage framework for textured 3D assets
StepFun released Step1X-3D, an open two-stage framework for high-fidelity, controllable generation of textured 3D assets: it first synthesizes watertight geometry, then generates view-consistent textures. Trained on 2M curated meshes, the release also includes a curated dataset of 800K assets and a Hugging Face demo.
New ModelsOpen weights
ART·E
OpenPipe's ART·E: RL-trained open email agent that beats o3
OpenPipe released ART·E, an Apache 2.0 email research agent built on a 14B Qwen 2.5 backbone, trained on 500K Enron emails plus synthetic Q&A and refined with reinforcement learning. It tops o3 on accuracy (96% vs 90%) while running 5x faster (1.1s median) and 64x cheaper ($0.85 per 1,000 queries), using a simple three-tool loop.
DatasetsOpen weights
PromptEvals
PromptEvals: 12K+ real production assertion criteria for LLM evals
Shreya Shankar and collaborators released PromptEvals, the first large-scale corpus of production LLM guardrails: 2,087 developer prompts paired with 12,623 assertion criteria covering structure, style, grounding and hallucination checks, about 5x larger than prior sets. Fine-tuned open Mistral-7B and Llama-3-8B checkpoints generate assertions +21 F1 better than GPT-4o at a fraction of the latency. Accepted to NAACL 2025.
New ModelsOpen weights
MiMo-7B
Xiaomi enters open weights with MiMo-7B, MIT-licensed reasoning family
Xiaomi's first open-weights release is a 7B dense family (Base, SFT, RL, RL-Zero) trained from scratch on 25T tokens with a multi-token-prediction objective and rule-verifiable reinforcement learning. The RL variant matches OpenAI o1-mini on benchmark suites despite being far smaller, scoring 55.4% on AIME 2025 and 49.3% on LiveCodeBench v6, all under an MIT license with vLLM-ready weights.