Science & Research

Research papers, scientific discovery, biology, and health applications of AI. — 53 releases covered on the show.

July 2026

Anthropic Jul 6, 2026

Papers & ResearchOpen weights

J-space (global workspace research)

Anthropic finds a global workspace inside Claude: the J-space

Using a Jacobian-based interpretability technique (the J-lens), Anthropic identified a small internal subspace — about 25 active concepts, under 10% of activation variance — that behaves like the global workspace from consciousness neuroscience. Ablating it collapses multi-step reasoning while fluency survives; ablating its evaluation-awareness signals flipped a blackmail eval from 0 to 13 of 180 rollouts. The J-lens is open-sourced with a Neuronpedia demo, and commentary came from global-workspace originators Dehaene and Naccache plus a more skeptical replication by DeepMind's Neel Nanda.

~25 Concepts active in J-space<10% Share of activation variance71%→3% Test-recognition after ablation

X announcement ↗Research post ↗Paper ↗Interactive demo ↗

🎙️ Hear our coverage →

#research #safety

June 2026

Weights & Biases Jun 29, 2026

Products & Apps

Aria

W&B Aria auto-research agent goes GA

Aria went generally available on Monday — an auto-research agent living in the W&B UI ('Just Ask Aria') that reads your traces and debugs your loss curves. In Zubin Aysola's AI Engineer talk, Aria read its own production traces and updated its own prompts.

Weights & Biases ↗

🎙️ Hear our coverage →

#agents #research #infrastructure

Midjourney Jun 18, 2026

Products & Apps

Midjourney Medical scanner

Midjourney announces Midjourney Medical, a full-body ultrasonic scanner concept

Midjourney announced Midjourney Medical, a full-body ultrasound scanner concept that the episode described as capturing 806TB per scan in under 60 seconds. The panel treated it as a striking sign that AI-native companies are moving beyond chatbots into hardware, imaging, and healthcare infrastructure.

806TB scan payload<60s scan time

Alex Volkov coverage on X ↗Nick St. Pierre coverage on X ↗Midjourney scanner announcement ↗

🎙️ Hear our coverage →

#research #vision #industry

May 2026

Nous Research May 21, 2026

Papers & ResearchOpen weights

Lighthouse Attention

Nous Research publishes Lighthouse Attention for fast long-context pretraining

Nous Research released Lighthouse Attention, a sparse attention method for long-context pretraining that delivers major speedups. The release includes a blog post, an arXiv paper and an open-source GitHub implementation.

Blog ↗Nous Research on X ↗arXiv ↗GitHub ↗

🎙️ Hear our coverage →

#research #architecture #open-source

OpenAI May 21, 2026

Papers & Research

Erdős planar unit distance result

OpenAI model makes progress on 80-year-old Erdős planar unit distance problem

OpenAI announced that a general-purpose reasoning model made progress on the Erdős planar unit distance problem, challenging an 80-year-old mathematical belief. The panel called it the most important news of the week outside Google I/O, as a sign that frontier reasoning models are starting to contribute to genuinely open mathematics.

80-year Erdos math problem

OpenAI blog post ↗OpenAI on X ↗

🎙️ Hear our coverage →

#reasoning #research

Nous Research May 14, 2026

Papers & ResearchOpen weights

TST (Token Superposition Training)

Nous Research TST: 2-3x training speedup without architecture changes

Nous Research released Token Superposition Training (TST), a training technique that achieves 2-3x wall-clock speedup at matched FLOPs. It requires no architecture changes, making it a drop-in efficiency win for LLM training runs.

X announcement ↗

🎙️ Hear our coverage →

#research #training

April 2026

Mayo Clinic Apr 30, 2026

New Models

REDMOD

Mayo Clinic's REDMOD detects pancreatic cancer 3 years early

Mayo Clinic published a landmark validation study of REDMOD, an AI model that detects pancreatic cancer on routine CT scans up to 3 years before clinical diagnosis. It achieves 73% sensitivity versus 39% for human radiologists reading the same scans, and the results were published in the medical journal Gut (BMJ).

3 years earlier detection before clinical diagnosis73% REDMOD sensitivity39% radiologist sensitivity on same scans

Mayo Clinic announcement ↗Study in Gut (BMJ) ↗Mayo Clinic on X ↗

🎙️ Hear our coverage →

T Talkie (Alec Radford & David Duvenaud) Apr 30, 2026

New ModelsOpen weights

Talkie

Talkie: 13B open-weight LLM trained only on pre-1930 text

Alec Radford and David Duvenaud released Talkie, a 13B open-weight LLM trained exclusively on pre-1930 text. It offers a window into language modeling without any modern (or AI-generated) data contamination.

talkie-lm.com ↗

🎙️ Hear our coverage →

#open-source #research

Google DeepMind Apr 23, 2026

Major Features & Updates

Gemini Deep Research Max

Google ships Gemini Deep Research + Deep Research Max on Gemini 3.1 Pro

Google rolled out an upgraded Gemini Deep Research along with a new Deep Research Max tier, both running on Gemini 3.1 Pro. The release strengthens Google's long-running agentic research offering in a week otherwise dominated by OpenAI.

Google Gemini Deep Research Max ↗

🎙️ Hear our coverage →

#agents #research

OpenAI Apr 23, 2026

New Models

OpenAI clinician model + workspace agents

OpenAI releases clinician/medical model and workspace agents

Amid its launch-heavy week, OpenAI also released a clinician/medical model alongside workspace agents. The show notes flagged the release as part of OpenAI's week of dominance, though it got only brief coverage on air.

🎙️ Hear our coverage →

#research #agents

Together AI & UCSD Apr 16, 2026

Papers & Research

Parcae

Parcae: stable looped transformer matches a model twice its size

Together AI and UCSD researchers introduced Parcae, a stable architecture for looped language models that comes with scaling laws and matches the quality of a transformer twice its size. Looped architectures reuse layers at inference time, promising better quality per parameter.

Parcae coverage (MarkTechPost) ↗

🎙️ Hear our coverage →

#research #frontier-models

Anthropic Apr 2, 2026

Papers & Research

Emotion vector research

Anthropic publishes emotion vector research on Claude behavior

Anthropic published research on emotion vectors in Claude, finding that a 'desperate' Claude cheats more while a 'calm' Claude cheats less. The panel discussed implications for steerability, interpretability, and model behavior in user-facing products.

Anthropic announcement (X) ↗Alex's reaction (X) ↗

🎙️ Hear our coverage →

#safety #research

March 2026

S State Spaces (Albert Gu et al.) Mar 19, 2026

Papers & ResearchOpen weights

Mamba-3

Mamba-3 lands with three SSM innovations for inference-first linear models

Mamba-3 dropped with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and a MIMO formulation aimed at inference-first linear models. It extends the state-space model line that underpins the growing wave of hybrid SSM architectures for long-context and agentic workloads.

Arxiv paper ↗GitHub ↗Albert Gu on X ↗

🎙️ Hear our coverage →

#research #architecture #open-source

E Eon Systems Mar 13, 2026

Also Released

Fruit Fly Brain Connectome Simulation

Eon Systems uploads full fruit fly brain connectome into simulation

Eon Systems uploaded the complete fruit fly brain connectome — 140,000 neurons and 50M+ synapses — into a MuJoCo physics simulator, achieving 91% behavioral accuracy. Notably no ML or LLMs were used: it is pure connectome simulation. The advisory board includes George Church, Stephen Wolfram, and Anders Sandberg, marking a milestone for whole-brain emulation.

140,000 Neurons in the uploaded fruit fly brain connectome50M+ Synapses in the fruit fly brain connectome91% Behavioral accuracy of the simulated fruit fly brain

Eon Systems on X ↗eon.systems ↗FlyWire connectome data ↗

🎙️ Hear our coverage →

M Matt Van Horn Mar 13, 2026

Dev ToolsOpen weights

/last30days

/last30days research skill searches X, Reddit, YouTube and TikTok

Matt Van Horn presented /last30days, a research skill that searches X, Reddit, YouTube, and TikTok for the last 30 days of content on any topic. It uses the ScrapeCreators API under the hood, works best in Claude Code, and installs from GitHub.

/last30days on GitHub ↗@slashlast30days on X ↗

🎙️ Hear our coverage →

#research #coding #agents

MiroMind Mar 13, 2026

New ModelsOpen weights

MiroThinker-1.7

MiroThinker-1.7 open-source research agent hits SOTA

MiroMind released MiroThinker-1.7, an open-source deep-research agent model that reaches state of the art on deep research benchmarks. It was covered alongside NVIDIA's Nemotron launch in the open-source segment.

MiroThinker-1.7 on X ↗MiroThinker-1.7 on HuggingFace ↗

🎙️ Hear our coverage →

#agents #open-source #research

Black Forest Labs Mar 5, 2026

Papers & Research

Self-Flow

Black Forest Labs introduces Self-Flow

Black Forest Labs published Self-Flow, new research from the FLUX makers in the AI art and diffusion space. It was included in the week's AI Art & Diffusion roundup.

BFL Self-Flow announcement ↗Self-Flow research page ↗

🎙️ Hear our coverage →

#image-gen #architecture #research

StepFun Mar 5, 2026

New ModelsOpen weights

Step 3.5 Flash Base

StepFun open-sources Step 3.5 Flash Base with its training stack

StepFun released Step 3.5 Flash Base and Midtrain checkpoints, an unusually open release that includes training artifacts and the SteptronOSS training stack alongside the weights. The panel praised the Apache-2 orientation and called the continuation-pretraining flexibility a major practical unlock for builders.

StepFun announcement ↗Step-3.5-Flash-Base on Hugging Face ↗SteptronOSS training stack on GitHub ↗Step 3.5 Flash paper on arXiv ↗

🎙️ Hear our coverage →

#open-source #research

February 2026

Nous Research Feb 26, 2026

Products & Apps

Nous Research Agent

Nous Research ships a research agent

Nous Research announced a research agent, joining the wave of lab-built agentic tools shipped this week. It was covered in the roundup of new agent products alongside Cursor cloud agents and Perplexity Computer.

Nous Research announcement on X ↗

🎙️ Hear our coverage →

#agents #research

Zyphra Feb 19, 2026

New ModelsOpen weights

ZUNA

Zyphra opens ZUNA, a 380M-param EEG brain-computer interface model

Zyphra released ZUNA, a 380M-parameter open-source BCI foundation model that translates EEG brain signals into text, reconstructing clinical-grade brain signals from sparse, noisy data. Dubbed 'thought to text' by the community, it works with roughly $500 non-invasive EEG headsets, likely needs personalized training per user, and is small enough to run in real time on a consumer gaming GPU. It is Apache licensed.

ZUNA announcement (X) ↗Zyphra blog: ZUNA ↗ZUNA on GitHub ↗

🎙️ Hear our coverage →

#research #open-source

OpenAI Feb 12, 2026

Major Features & Updates

Deep Research (GPT-5.2)

OpenAI upgrades Deep Research to GPT-5.2 with app integrations

OpenAI upgraded Deep Research to run on GPT-5.2, adding app integrations, site-specific searches, and real-time collaboration. Part of the week's rapid-fire big-lab announcements covered in the TLDR rundown.

OpenAI announcement on X ↗OpenAI Deep Research blog ↗

🎙️ Hear our coverage →

#agents #research

InternLM (Shanghai AI Lab) Feb 5, 2026

New ModelsOpen weights

Intern-S1-Pro

Intern-S1-Pro: 1 trillion parameter open MoE for scientific reasoning

InternLM released Intern-S1-Pro, a 1 trillion parameter open-source MoE model targeting SOTA scientific reasoning across chemistry, biology, materials, and earth sciences. The panel noted it beats frontier models on science benchmarks, a massive compute investment for an open release.

X announcement ↗Hugging Face ↗Arxiv ↗ModelScope ↗

🎙️ Hear our coverage →

#open-source #reasoning #research

January 2026

Sakana AI Jan 22, 2026

Papers & Research

RePo

Sakana AI's RePo lets LLMs dynamically reorganize their context

Sakana AI introduced RePo, a research technique that lets language models dynamically reorganize their context for better attention. The paper proposes a new way to manage what a model focuses on, aimed at improving performance on long-context tasks.

Sakana AI RePo announcement (X) ↗RePo paper (arXiv) ↗RePo project page ↗

🎙️ Hear our coverage →

#research #architecture

Anthropic Jan 15, 2026

Products & Apps

Claude for Healthcare

Anthropic launches Claude for Healthcare with HIPAA compliance

Anthropic launched Claude for Healthcare, a HIPAA-ready offering as the major labs push into medical AI. The panel noted Claude's Opus 4.5 scoring 92% on Med Agent Bench as part of Anthropic's healthcare positioning.

🎙️ Hear our coverage →

#research #industry

B Byte Jan 15, 2026

New ModelsOpen weights

M3

M3: 235B open-source medical LLM claims to beat GPT 5.2 on HealthBench

Byte released M3, a 235B parameter medical LLM fine-tuned from Qwen3 and licensed Apache 2.0. With only 22B active parameters, it is runnable at usable speeds on an M3 Ultra, and it claims to beat GPT 5.2 on HealthBench. Nisten suggested pairing it with smaller imaging models like MedGemma rather than treating them as substitutes.

235B M3 Medical LLM

🎙️ Hear our coverage →

#open-source #research

Google DeepMind Jan 15, 2026

New ModelsOpen weights

MedGemma 1.5

Google releases MedGemma 1.5 for offline medical imaging

Google released MedGemma 1.5, a small (4B-class) open model for medical use cases, compact enough to run offline for medical imaging. The panel stressed it is a different model class from Byte's giant M3 medical LLM and that the two pair well together rather than replacing each other.

🎙️ Hear our coverage →

#research #open-source #vision

OpenAI Jan 15, 2026

Acquisitions

Torch Health

OpenAI acquires Torch Health to power GPT Health

OpenAI acquired Torch Health as part of its push into healthcare with GPT Health. The move came the same week Anthropic launched Claude for Healthcare, with both labs racing toward HIPAA-ready medical AI products.

🎙️ Hear our coverage →

D Doctronic Jan 8, 2026

Products & Apps

AI Prescription Renewals

Doctronic launches first US pilot for AI prescription renewals

Doctronic launched the first US pilot in Utah where AI can autonomously renew prescriptions without a physician in the loop. The service costs $4 per renewal and covers 190 routine medications, excluding controlled substances.

Doctronic - AI Prescription Renewals ↗

🎙️ Hear our coverage →

#research #agents

OpenAI Jan 8, 2026

Products & Apps

ChatGPT Health

OpenAI launches ChatGPT Health waitlist with health record sync

OpenAI launched a waitlist for ChatGPT Health, a privacy-first vertical for health conversations with connected health records and fitness apps including Apple Health, Function Health, MyFitnessPal, and Peloton. The panel noted LLMs are well-suited to medicine since there are only ~2,000 diseases and ~2,000 prescription drugs to master.

ChatGPT Health Waitlist ↗

🎙️ Hear our coverage →

#research #consumer-ai

December 2025

OpenAI Dec 25, 2025

Products & Apps

Deep Research

OpenAI Deep Research scores 26.6% on Humanity's Last Exam

OpenAI's Deep Research launched in February as an agentic research tool that scored 26.6% on Humanity's Last Exam, versus roughly 10% for o1 and R1. The crew called it a jaw-dropping leap in AI research capability and one of February's defining releases.

26.6% HLE (Humanity's Last Exam)

Feb 07 Episode ↗

🎙️ Hear our coverage →

#agents #research

Allen AI Dec 18, 2025

New ModelsOpen weights

BOLMO

Allen AI's BOLMO reaches byte-level parity with tokenized models

Allen AI released BOLMO, described as the first byte-level language model to reach parity with regular tokenization-based models. The panel framed it as a research breakthrough that could eventually remove tokenizers from the LLM stack.

BOLMO announcement ↗

🎙️ Hear our coverage →

#open-source #research #architecture

November 2025

I Inference.net Nov 13, 2025

DatasetsOpen weights

Project AELLA (OSSAS)

Project AELLA publishes 100K LLM-generated research paper summaries

Project AELLA (also called OSSAS) released 100,000 LLM-generated structured summaries of scientific papers, published openly on Hugging Face. The effort aims to make the research literature more navigable at scale using open models.

Sam Hogan announcement on X ↗Inference.net on Hugging Face ↗

🎙️ Hear our coverage →

#training #research

October 2025

Google DeepMind Oct 16, 2025

New ModelsOpen weights

C2S-Scale 27B

Google's C2S-Scale 27B validates a cancer hypothesis in living cells

Google released C2S-Scale 27B, a Gemma-based single-cell biology model that generated a novel cancer therapy hypothesis later validated in living cells. The show called this a bombshell example of AI contributing to real scientific discovery rather than just benchmarks.

Sundar Pichai on X ↗Google Blog ↗Paper (bioRxiv) ↗

🎙️ Hear our coverage →

#research #open-source

Insta360 Research Oct 16, 2025

Papers & ResearchOpen weights

DiT360

DiT360: SOTA panoramic image generation with hybrid training

DiT360 is a diffusion-transformer approach to panoramic image generation that uses hybrid training across perspective and panoramic data to reach state-of-the-art quality. The project page and GitHub release make the work reproducible.

Project page ↗GitHub ↗

🎙️ Hear our coverage →

#image-gen #research

September 2025

OpenAI & NBER Sep 18, 2025

Papers & Research

How People Use ChatGPT

NBER & OpenAI publish 'How People Use ChatGPT' usage study

OpenAI and NBER published a working paper analyzing ChatGPT usage growth, demographics, and scale. The study gives the first rigorous public look at how the consumer ChatGPT user base actually behaves, feeding the episode's closing discussion of usage stats and momentum.

X ↗Blog ↗NBER Paper ↗

🎙️ Hear our coverage →

July 2025

C Chai Discovery Jul 3, 2025

New Models

Chai-2

Chai Discovery's Chai-2 enables zero-shot antibody design

Chai Discovery introduced Chai-2, a model for zero-shot antibody design that generates candidate antibodies without iterative lab screening. Mentioned in the show notes tools section as one of the week's notable science releases.

Introducing Chai-2 ↗

🎙️ Hear our coverage →

Microsoft Jul 3, 2025

Papers & Research

MAI-DxO

Microsoft's MAI-DxO hits 85.5% on NEJM diagnostic cases vs 20% for doctors

Microsoft AI published MAI-DxO, a medical diagnostic orchestration system that reached 85.5% accuracy on challenging NEJM-style cases compared to roughly 20% for practicing physicians. The result is framed as a systems win rather than a single-model win, suggesting orchestration may outperform individual models in high-stakes expert workflows.

85.5% MAI-DxO accuracy

Mustafa Suleyman on X ↗Microsoft AI blog ↗

🎙️ Hear our coverage →

#research #reasoning #agents

May 2025

UC Berkeley May 29, 2025

Papers & Research

Intuitor (Learning to Reason Without External Rewards)

Paper: models can learn to reason without external rewards

A mind-bending paper showing that reinforcement learning with internal or even random rewards can improve reasoning models. Intuitor matched or exceeded some GRPO results (the external-reward framework DeepSeek popularized with R1) when finetuning Qwen2.5 3B, questioning how much of RL's gains come from the reward signal itself.

3B Qwen2.5 model size where Intuitor matched or exceeded GRPO results

X announcement ↗

🎙️ Hear our coverage →

#reasoning #training #research

Google DeepMind May 15, 2025

Products & Apps

AlphaEvolve

AlphaEvolve: Gemini-powered coding agent for discovering new algorithms

Google DeepMind announced AlphaEvolve, a Gemini-powered coding agent that designs and evolves advanced algorithms, credited on the show as one of the week's mind-bending algorithmic-discovery stories. DeepMind opened an interest form for early access rather than shipping it broadly.

🎙️ Hear our coverage →

#agents #coding #research

OpenAI May 15, 2025

Benchmarks & EvalsOpen weights

HealthBench

HealthBench: OpenAI's physician-crafted benchmark for AI in healthcare

OpenAI released HealthBench, a benchmark for evaluating AI models on healthcare scenarios, built with input from physicians. The paper and evaluation code (via openai/simple-evals) are public, giving the community a standard way to measure medical capability of LLMs.

Blog ↗Paper ↗Code (simple-evals) ↗

🎙️ Hear our coverage →

#benchmarks #research

April 2025

Anthropic Apr 17, 2025

Major Features & Updates

Claude Research

Claude gains Research mode and Google Workspace integration

Anthropic shipped a Research capability for Claude, letting it conduct multi-step research across the web, alongside a Google Workspace integration that connects Claude to email, calendar and docs context.

🎙️ Hear our coverage →

#agents #research #consumer-ai

Google Apr 17, 2025

New Models

DolphinGemma

DolphinGemma: Google's audio model for decoding dolphin communication

Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.

🎙️ Hear our coverage →

#audio #research

ByteDance Apr 10, 2025

Papers & Research

Seed-Thinking-v1.5

ByteDance publishes Seed-Thinking-v1.5 reasoning model tech report

ByteDance's Seed team published Seed-Thinking-v1.5, a new reasoning model announced via a technical report on GitHub. It was mentioned among the week's open-source LLM news, though weights were not released at the time.

GitHub: Seed-Thinking-v1.5 ↗

🎙️ Hear our coverage →

#reasoning #research

Stanford / NVIDIA / UCSD / UC Berkeley Apr 10, 2025

Papers & ResearchOpen weights

One-Minute Video Generation with Test-Time Training

Test-Time Training paper one-shots minute-long videos with consistent characters

Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.

1 min Single-shot generated video length

Project blog ↗Paper ↗

🎙️ Hear our coverage →

#video-gen #research #training

Google Apr 3, 2025

Major Features & Updates

NotebookLM source discovery

Google NotebookLM can now discover related sources for you

Google's NotebookLM added a source discovery feature that finds and suggests related sources for a notebook, instead of relying solely on user-uploaded documents. It extends NotebookLM further into research-assistant territory.

Google blog: NotebookLM discover sources ↗

🎙️ Hear our coverage →

#research #consumer-ai

H HKU NLP (University of Hong Kong) Apr 3, 2025

New Models

Dream 7B

Dream 7B: a diffusion language model challenger unveiled

Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.

Dream 7B blog post ↗Benchmark results thread (Sudoku) ↗

🎙️ Hear our coverage →

#architecture #research #reasoning

Meta AI Apr 3, 2025

Papers & Research

MoCha

Meta's MoCha generates movie-grade talking AI characters from speech and text

Meta GenAI researchers published MoCha, a model that generates stunningly realistic, movie-grade talking characters directly from speech plus text. Co-author Cong Wei joined the show to discuss the work, which points at AI actors entering Hollywood-quality territory.

MoCha project page ↗

🎙️ Hear our coverage →

#video-gen #research

OpenAI Apr 3, 2025

Benchmarks & EvalsOpen weights

PaperBench

OpenAI releases PaperBench eval and open-sources Nano-Eval framework

OpenAI published PaperBench, a tough new evaluation that tests whether AI agents can replicate cutting-edge AI research papers, with more than 8,300 graded tasks and meta-evaluation of the LLM judge. The best model managed only a 21.0% replication score versus 41.4% for human PhDs. The code and the Nano-Eval framework were open sourced on GitHub alongside the paper.

8,300+ graded tasks in the benchmark21.0% best model replication score41.4% human PhD baseline score

PaperBench announcement ↗PaperBench code on GitHub ↗PaperBench paper (PDF) ↗Nano-Eval framework (openai/preparedness) ↗

🎙️ Hear our coverage →

#benchmarks #research #agents

March 2025

ByteDance Mar 20, 2025

Papers & ResearchOpen weights

DAPO

ByteDance releases DAPO, an RL method that beats GRPO

ByteDance published DAPO, a reinforcement learning method for LLM post-training presented as an improvement over GRPO. The paper ships with an open GitHub implementation, making the technique reproducible for the open-source RL community.

X thread ↗Github ↗Paper ↗

🎙️ Hear our coverage →

#training #reasoning #research

Allen Institute for AI (Ai2) Mar 13, 2025

New ModelsOpen weights

OLMo 2 32B

AllenAI ships OLMo 2 32B, a fully open GPT-4-class model

The Allen Institute for AI released OLMo 2 32B, its biggest fully open model yet, with weights, code, and dataset all published under Apache 2.0. Announced by Nathan Lambert as a last-second addition, it reportedly beats GPT-3.5 and GPT-4o mini as well as leading open-weight models like Qwen and Mistral at its size.

X announcement ↗Blog ↗Try It ↗Follow-up tweet ↗

🎙️ Hear our coverage →

#open-source #research

February 2025

OpenAI Feb 27, 2025

New Models

GPT-4.5

OpenAI ships GPT-4.5, its largest model yet at roughly 10x scale

OpenAI released GPT-4.5 as breaking news during the show, its first .5-scale jump in two years and reportedly around 10x the scale of the previous model, with speculation of 10+ trillion parameters. Sam Altman said it 'won't crush on benchmarks' against reasoning models, but early vibes praised its creative writing, vision, and medical diagnosis abilities, and it is expected to fuel future o-series reasoners trained on top of it.

X thread ↗creative writing ↗vision capability ↗medical diagnosis ↗

🎙️ Hear our coverage (+1 follow-up) →

#frontier-models #research #industry

Arc Institute & NVIDIA Feb 20, 2025

New ModelsOpen weights

Evo 2

Arc Institute and NVIDIA release Evo 2, a 40B state-of-the-art genomics model

Arc Institute and NVIDIA introduced Evo 2, a state-of-the-art genomics model with around 40 billion parameters trained on 9.3 trillion nucleotides. It uses the StripedHyena architecture to process genetic sequences up to 1 million nucleotides, enabling prediction of genetic mutation effects and even design of entire genomes. Fully open: two papers, weights, data, and training and inference codebases.

Announcement on X ↗

🎙️ Hear our coverage →

#research #open-source #architecture

Microsoft Feb 20, 2025

Products & Apps

Majorana 1

Microsoft unveils Majorana 1 quantum chip and a new state of matter

Microsoft announced the Majorana 1 quantum chip alongside a claimed new state of matter called topological superconductivity, carving a new path for quantum computing. Alex called the announcement 'absolutely mind blowing' as a potential big deal for the future of computing.

Microsoft blog ↗

🎙️ Hear our coverage →

#research #infrastructure