Science & Research

Research papers, scientific discovery, biology, and health applications of AI. — 50 releases covered on the show.

May 2026

OpenAI
Papers & Research

Erdős planar unit distance result

OpenAI model makes progress on 80-year-old Erdős planar unit distance problem

OpenAI announced that a general-purpose reasoning model made progress on the Erdős planar unit distance problem, challenging an 80-year-old mathematical belief. The panel called it the most important news of the week outside Google I/O, as a sign that frontier reasoning models are starting to contribute to genuinely open mathematics.

80-year Erdos math problem
Nous Research
Papers & ResearchOpen weights

TST (Token Superposition Training)

Nous Research TST: 2-3x training speedup without architecture changes

Nous Research released Token Superposition Training (TST), a training technique that achieves 2-3x wall-clock speedup at matched FLOPs. It requires no architecture changes, making it a drop-in efficiency win for LLM training runs.

April 2026

Mayo Clinic
New Models

REDMOD

Mayo Clinic's REDMOD detects pancreatic cancer 3 years early

Mayo Clinic published a landmark validation study of REDMOD, an AI model that detects pancreatic cancer on routine CT scans up to 3 years before clinical diagnosis. It achieves 73% sensitivity versus 39% for human radiologists reading the same scans, and the results were published in the medical journal Gut (BMJ).

3 years earlier detection before clinical diagnosis73% REDMOD sensitivity39% radiologist sensitivity on same scans
OpenAI
New Models

OpenAI clinician model + workspace agents

OpenAI releases clinician/medical model and workspace agents

Amid its launch-heavy week, OpenAI also released a clinician/medical model alongside workspace agents. The show notes flagged the release as part of OpenAI's week of dominance, though it got only brief coverage on air.

March 2026

Papers & ResearchOpen weights

Mamba-3

Mamba-3 lands with three SSM innovations for inference-first linear models

Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.

Eon Systems
Also Released

Fruit Fly Brain Connectome Simulation

Eon Systems uploads full fruit fly brain connectome into simulation

Eon Systems uploaded the complete fruit fly brain connectome — 140,000 neurons and 50M+ synapses — into a MuJoCo physics simulator, achieving 91% behavioral accuracy. Notably no ML or LLMs were used: it is pure connectome simulation. The advisory board includes George Church, Stephen Wolfram, and Anders Sandberg, marking a milestone for whole-brain emulation.

140,000 Neurons in the uploaded fruit fly brain connectome50M+ Synapses in the fruit fly brain connectome91% Behavioral accuracy of the simulated fruit fly brain
StepFun
New ModelsOpen weights

Step 3.5 Flash Base

StepFun open-sources Step 3.5 Flash Base with its training stack

StepFun released Step 3.5 Flash Base and Midtrain checkpoints, an unusually open release that includes training artifacts and the SteptronOSS training stack alongside the weights. The panel praised the Apache-2 orientation and called the continuation-pretraining flexibility a major practical unlock for builders.

February 2026

Zyphra
New ModelsOpen weights

ZUNA

Zyphra opens ZUNA, a 380M-param EEG brain-computer interface model

Zyphra released ZUNA, a 380M-parameter open-source BCI foundation model that translates EEG brain signals into text, reconstructing clinical-grade brain signals from sparse, noisy data. Dubbed 'thought to text' by the community, it works with roughly $500 non-invasive EEG headsets, likely needs personalized training per user, and is small enough to run in real time on a consumer gaming GPU. It is Apache licensed.

New ModelsOpen weights

Intern-S1-Pro

Intern-S1-Pro: 1 trillion parameter open MoE for scientific reasoning

InternLM released Intern-S1-Pro, a 1 trillion parameter open-source MoE model targeting SOTA scientific reasoning across chemistry, biology, materials, and earth sciences. The panel noted it beats frontier models on science benchmarks, a massive compute investment for an open release.

January 2026

Anthropic
Products & Apps

Claude for Healthcare

Anthropic launches Claude for Healthcare with HIPAA compliance

Anthropic launched Claude for Healthcare, a HIPAA-ready offering as the major labs push into medical AI. The panel noted Claude's Opus 4.5 scoring 92% on Med Agent Bench as part of Anthropic's healthcare positioning.

Byte
New ModelsOpen weights

M3

M3: 235B open-source medical LLM claims to beat GPT 5.2 on HealthBench

Byte released M3, a 235B parameter medical LLM fine-tuned from Qwen3 and licensed Apache 2.0. With only 22B active parameters, it is runnable at usable speeds on an M3 Ultra, and it claims to beat GPT 5.2 on HealthBench. Nisten suggested pairing it with smaller imaging models like MedGemma rather than treating them as substitutes.

235B M3 Medical LLM
Google DeepMind
New ModelsOpen weights

MedGemma 1.5

Google releases MedGemma 1.5 for offline medical imaging

Google released MedGemma 1.5, a small (4B-class) open model for medical use cases, compact enough to run offline for medical imaging. The panel stressed it is a different model class from Byte's giant M3 medical LLM and that the two pair well together rather than replacing each other.

OpenAI
Acquisitions

Torch Health

OpenAI acquires Torch Health to power GPT Health

OpenAI acquired Torch Health as part of its push into healthcare with GPT Health. The move came the same week Anthropic launched Claude for Healthcare, with both labs racing toward HIPAA-ready medical AI products.

OpenAI
Products & Apps

ChatGPT Health

OpenAI launches ChatGPT Health waitlist with health record sync

OpenAI launched a waitlist for ChatGPT Health, a privacy-first vertical for health conversations with connected health records and fitness apps including Apple Health, Function Health, MyFitnessPal, and Peloton. The panel noted LLMs are well-suited to medicine since there are only ~2,000 diseases and ~2,000 prescription drugs to master.

December 2025

OpenAI
Products & Apps

Deep Research

OpenAI Deep Research scores 26.6% on Humanity's Last Exam

OpenAI's Deep Research launched in February as an agentic research tool that scored 26.6% on Humanity's Last Exam, versus roughly 10% for o1 and R1. The crew called it a jaw-dropping leap in AI research capability and one of February's defining releases.

26.6% HLE (Humanity's Last Exam)

November 2025

October 2025

Google DeepMind
New ModelsOpen weights

C2S-Scale 27B

Google's C2S-Scale 27B validates a cancer hypothesis in living cells

Google released C2S-Scale 27B, a Gemma-based single-cell biology model that generated a novel cancer therapy hypothesis later validated in living cells. The show called this a bombshell example of AI contributing to real scientific discovery rather than just benchmarks.

September 2025

OpenAI & NBER
Papers & Research

How People Use ChatGPT

NBER & OpenAI publish 'How People Use ChatGPT' usage study

OpenAI and NBER published a working paper analyzing ChatGPT usage growth, demographics, and scale. The study gives the first rigorous public look at how the consumer ChatGPT user base actually behaves, feeding the episode's closing discussion of usage stats and momentum.

July 2025

Chai Discovery
New Models

Chai-2

Chai Discovery's Chai-2 enables zero-shot antibody design

Chai Discovery introduced Chai-2, a model for zero-shot antibody design that generates candidate antibodies without iterative lab screening. Mentioned in the show notes tools section as one of the week's notable science releases.

Microsoft
Papers & Research

MAI-DxO

Microsoft's MAI-DxO hits 85.5% on NEJM diagnostic cases vs 20% for doctors

Microsoft AI published MAI-DxO, a medical diagnostic orchestration system that reached 85.5% accuracy on challenging NEJM-style cases compared to roughly 20% for practicing physicians. The result is framed as a systems win rather than a single-model win, suggesting orchestration may outperform individual models in high-stakes expert workflows.

85.5% MAI-DxO accuracy

May 2025

UC Berkeley
Papers & Research

Intuitor (Learning to Reason Without External Rewards)

Paper: models can learn to reason without external rewards

A mind-bending paper showing that reinforcement learning with internal or even random rewards can improve reasoning models. Intuitor matched or exceeded some GRPO results (the external-reward framework DeepSeek popularized with R1) when finetuning Qwen2.5 3B, questioning how much of RL's gains come from the reward signal itself.

3B Qwen2.5 model size where Intuitor matched or exceeded GRPO results
Google DeepMind
Products & Apps

AlphaEvolve

AlphaEvolve: Gemini-powered coding agent for discovering new algorithms

Google DeepMind announced AlphaEvolve, a Gemini-powered coding agent that designs and evolves advanced algorithms, credited on the show as one of the week's mind-bending algorithmic-discovery stories. DeepMind opened an interest form for early access rather than shipping it broadly.

OpenAI
Benchmarks & EvalsOpen weights

HealthBench

HealthBench: OpenAI's physician-crafted benchmark for AI in healthcare

OpenAI released HealthBench, a benchmark for evaluating AI models on healthcare scenarios, built with input from physicians. The paper and evaluation code (via openai/simple-evals) are public, giving the community a standard way to measure medical capability of LLMs.

April 2025

Anthropic
Major Features & Updates

Claude Research

Claude gains Research mode and Google Workspace integration

Anthropic shipped a Research capability for Claude, letting it conduct multi-step research across the web, alongside a Google Workspace integration that connects Claude to email, calendar and docs context.

Google
New Models

DolphinGemma

DolphinGemma: Google's audio model for decoding dolphin communication

Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.

Papers & ResearchOpen weights

One-Minute Video Generation with Test-Time Training

Test-Time Training paper one-shots minute-long videos with consistent characters

Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.

1 min Single-shot generated video length
New Models

Dream 7B

Dream 7B: a diffusion language model challenger unveiled

Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.

Meta AI
Papers & Research

MoCha

Meta's MoCha generates movie-grade talking AI characters from speech and text

Meta GenAI researchers published MoCha, a model that generates stunningly realistic, movie-grade talking characters directly from speech plus text. Co-author Cong Wei joined the show to discuss the work, which points at AI actors entering Hollywood-quality territory.

OpenAI
Benchmarks & EvalsOpen weights

PaperBench

OpenAI releases PaperBench eval and open-sources Nano-Eval framework

OpenAI published PaperBench, a tough new evaluation that tests whether AI agents can replicate cutting-edge AI research papers, with more than 8,300 graded tasks and meta-evaluation of the LLM judge. The best model managed only a 21.0% replication score versus 41.4% for human PhDs. The code and the Nano-Eval framework were open sourced on GitHub alongside the paper.

8,300+ graded tasks in the benchmark21.0% best model replication score41.4% human PhD baseline score

March 2025

New ModelsOpen weights

OLMo 2 32B

AllenAI ships OLMo 2 32B, a fully open GPT-4-class model

The Allen Institute for AI released OLMo 2 32B, its biggest fully open model yet, with weights, code, and dataset all published under Apache 2.0. Announced by Nathan Lambert as a last-second addition, it reportedly beats GPT-3.5 and GPT-4o mini as well as leading open-weight models like Qwen and Mistral at its size.

February 2025

OpenAI
New Models

GPT-4.5

OpenAI ships GPT-4.5, its largest model yet at roughly 10x scale

OpenAI released GPT-4.5 as breaking news during the show, its first .5-scale jump in two years and reportedly around 10x the scale of the previous model, with speculation of 10+ trillion parameters. Sam Altman said it 'won't crush on benchmarks' against reasoning models, but early vibes praised its creative writing, vision, and medical diagnosis abilities, and it is expected to fuel future o-series reasoners trained on top of it.

Arc Institute & NVIDIA
New ModelsOpen weights

Evo 2

Arc Institute and NVIDIA release Evo 2, a 40B state-of-the-art genomics model

Arc Institute and NVIDIA introduced Evo 2, a state-of-the-art genomics model with around 40 billion parameters trained on 9.3 trillion nucleotides. It uses the StripedHyena architecture to process genetic sequences up to 1 million nucleotides, enabling prediction of genetic mutation effects and even design of entire genomes. Fully open: two papers, weights, data, and training and inference codebases.

Microsoft
Products & Apps

Majorana 1

Microsoft unveils Majorana 1 quantum chip and a new state of matter

Microsoft announced the Majorana 1 quantum chip alongside a claimed new state of matter called topological superconductivity, carving a new path for quantum computing. Alex called the announcement 'absolutely mind blowing' as a potential big deal for the future of computing.