UC Berkeley

3 releases covered on ThursdAI · berkeley.edu ↗

May 2025

UC Berkeley
Papers & Research

Intuitor (Learning to Reason Without External Rewards)

Paper: models can learn to reason without external rewards

A mind-bending paper showing that reinforcement learning with internal or even random rewards can improve reasoning models. Intuitor matched or exceeded some GRPO results (the external-reward framework DeepSeek popularized with R1) when finetuning Qwen2.5 3B, questioning how much of RL's gains come from the reward signal itself.

3B Qwen2.5 model size where Intuitor matched or exceeded GRPO results
UC Berkeley
DatasetsOpen weights

PromptEvals

PromptEvals: 12K+ real production assertion criteria for LLM evals

Shreya Shankar and collaborators released PromptEvals, the first large-scale corpus of production LLM guardrails: 2,087 developer prompts paired with 12,623 assertion criteria covering structure, style, grounding and hallucination checks, about 5x larger than prior sets. Fine-tuned open Mistral-7B and Llama-3-8B checkpoints generate assertions +21 F1 better than GPT-4o at a fraction of the latency. Accepted to NAACL 2025.

January 2025

UC Berkeley
Papers & ResearchOpen weights

TinyZero & RAGEN

Berkeley TinyZero and RAGEN replicate DeepSeek R1-Zero

Berkeley researchers released TinyZero and RAGEN, open replications of DeepSeek's R1-Zero reinforcement-learning recipe on small models. The projects showed that R1-style emergent reasoning behavior can be reproduced cheaply, with training runs logged publicly on Weights & Biases.