UC Berkeley

3 releases covered on ThursdAI · berkeley.edu ↗

May 2025

UC Berkeley May 29, 2025

Papers & Research

Intuitor (Learning to Reason Without External Rewards)

Paper: models can learn to reason without external rewards

A mind-bending paper showing that reinforcement learning with internal or even random rewards can improve reasoning models. Intuitor matched or exceeded some GRPO results (the external-reward framework DeepSeek popularized with R1) when finetuning Qwen2.5 3B, questioning how much of RL's gains come from the reward signal itself.

3B Qwen2.5 model size where Intuitor matched or exceeded GRPO results

X announcement ↗

🎙️ Hear our coverage →

#reasoning #training #research

UC Berkeley May 1, 2025

DatasetsOpen weights

PromptEvals

PromptEvals: 12K+ real production assertion criteria for LLM evals

Shreya Shankar and collaborators released PromptEvals, the first large-scale corpus of production LLM guardrails: 2,087 developer prompts paired with 12,623 assertion criteria covering structure, style, grounding and hallucination checks, about 5x larger than prior sets. Fine-tuned open Mistral-7B and Llama-3-8B checkpoints generate assertions +21 F1 better than GPT-4o at a fraction of the latency. Accepted to NAACL 2025.

NAACL paper (ArXiv) ↗Dataset (Hugging Face) ↗Models (Hugging Face) ↗

🎙️ Hear our coverage →

#benchmarks #training #coding

January 2025

UC Berkeley Jan 30, 2025

Papers & ResearchOpen weights

TinyZero & RAGEN

Berkeley TinyZero and RAGEN replicate DeepSeek R1-Zero

Berkeley researchers released TinyZero and RAGEN, open replications of DeepSeek's R1-Zero reinforcement-learning recipe on small models. The projects showed that R1-style emergent reasoning behavior can be reproduced cheaply, with training runs logged publicly on Weights & Biases.

GitHub ↗W&B logs ↗

🎙️ Hear our coverage →

#reasoning #training #open-source

UC Berkeley

May 2025

Intuitor (Learning to Reason Without External Rewards)

PromptEvals

January 2025

TinyZero & RAGEN

Get this every week