Major Features & Updates
Claude Research
Claude gains Research mode and Google Workspace integration
Anthropic shipped a Research capability for Claude, letting it conduct multi-step research across the web, alongside a Google Workspace integration that connects Claude to email, calendar and docs context.
New Models
DolphinGemma
DolphinGemma: Google's audio model for decoding dolphin communication
Google, with Georgia Tech and the Wild Dolphin Project, announced DolphinGemma, a ~400M parameter audio model based on the Gemma architecture using SoundStream audio tokenization. Trained on decades of recorded dolphin clicks, whistles and pulses, it aims to decipher structure in dolphin communication and runs on a Pixel phone for field deployment.
Papers & Research
Seed-Thinking-v1.5
ByteDance publishes Seed-Thinking-v1.5 reasoning model tech report
ByteDance's Seed team published Seed-Thinking-v1.5, a new reasoning model announced via a technical report on GitHub. It was mentioned among the week's open-source LLM news, though weights were not released at the time.
Papers & ResearchOpen weights
One-Minute Video Generation with Test-Time Training
Test-Time Training paper one-shots minute-long videos with consistent characters
Researchers published 'One-Minute Video Generation with Test-Time Training', adding TTT layers to a pre-trained transformer to one-shot generate minute-long videos with remarkable character and scene consistency. The Tom & Jerry style demos showed the most impressive long-form AI video consistency to date.
1 min Single-shot generated video length
Major Features & Updates
NotebookLM source discovery
Google NotebookLM can now discover related sources for you
Google's NotebookLM added a source discovery feature that finds and suggests related sources for a notebook, instead of relying solely on user-uploaded documents. It extends NotebookLM further into research-assistant territory.
New Models
Dream 7B
Dream 7B: a diffusion language model challenger unveiled
Researchers unveiled Dream 7B, a diffusion-based language model that posts strong benchmark results, notably on planning-style tasks like Sudoku, possibly because parallel generation handles global constraints better than autoregression. It hints at viable alternative LLM architectures, but the weights were not yet released at show time, so results could not be independently verified.
Papers & Research
MoCha
Meta's MoCha generates movie-grade talking AI characters from speech and text
Meta GenAI researchers published MoCha, a model that generates stunningly realistic, movie-grade talking characters directly from speech plus text. Co-author Cong Wei joined the show to discuss the work, which points at AI actors entering Hollywood-quality territory.
Benchmarks & EvalsOpen weights
PaperBench
OpenAI releases PaperBench eval and open-sources Nano-Eval framework
OpenAI published PaperBench, a tough new evaluation that tests whether AI agents can replicate cutting-edge AI research papers, with more than 8,300 graded tasks and meta-evaluation of the LLM judge. The best model managed only a 21.0% replication score versus 41.4% for human PhDs. The code and the Nano-Eval framework were open sourced on GitHub alongside the paper.
8,300+ graded tasks in the benchmark21.0% best model replication score41.4% human PhD baseline score