Episode Summary
A 'slow week' that absolutely wasn't. Anthropic shipped Claude Opus 4.8 LIVE mid-show (Alex got to slam the breaking-news button), with 69.2% on SWE-bench Pro and a long-context jump that finally pushes past the usual 200K cliff β plus Dynamic Workflows and Ultra Code in Claude Code that ported Bun from Zig to Rust in 11 days. The crew also spent a big chunk on Pope Leo XIV's first AI encyclical, a 42,000-word, surprisingly non-doomer document with Anthropic's Chris Olah speaking at the Vatican. Throw in Illinois passing the first US frontier-AI audit law 110-0, DeepSWE exposing that Claude was literally reading git history to cheat benchmarks, and post-show drops from ElevenLabs (Dubbing v2) and Cartesia (Ink-2) that Alex says blew his mind. Classic ThursdAIβ’ timing.
In This Episode
Hosts & Guests
By The Numbers
π₯ Breaking During The Show
π° Show Open & Big-Lab Rumors
Alex and Wolfram kick off the last show of May with the running joke that big labs love to ship on a Thursday β and rumors already circulating that a new Opus might drop. Wolfram flags the Pope's encyclical as the week's biggest story before anything else even lands.
- Breaking-news button primed for expected big-lab drops
- Claude Code rumors hinting at a new Opus
- Wolfram picks the Pope encyclical as story of the week
π§ͺ Vibe-Solving ErdΕs Problems
Following last week's OpenAI ErdΕs news, Anthropic's Mythos and DeepMind's Gemini also cracked open problems β DeepMind doing it the hard way through Lean. The crew dwells on the real bottleneck: not generating proofs, but verifying them when LLM-as-a-judge isn't enough.
- Anthropic's Mythos solved the same ErdΕs problem off-the-cuff
- DeepMind went 'full Ralph' with Gemini + Lean compiler
- Verification, not generation, is the hard part
π° TL;DR β Rapid-Fire News
The signature roundup: rising AI hate online (and the crew's vow to fight the doomer narrative), open-source wins from OpenBMB's MiniCPM5-1B and Tencent's tiny translation model, Google's Universal Cart/AP2 commerce protocols and free native Android apps in AI Studio, CuaDriver bringing background computer-use to Windows, and a surprise #3 finish for Microsoft's MAI Image 2.5 on Arena.
- MiniCPM5-1B: SOTA 1B model, 17.9 AAII, runs on your phone
- Tencent Hy-MT2 1.8B beats Microsoft's paid Translator API
- Google AI Studio built 250K native Android apps in week one
- Prism ML 1-bit 'Bonsai' diffusion runs in-browser via WebGPU
- Microsoft MAI Image 2.5 jumps to #3 on LM Arena
β‘ This Week's Buzz β W&B MCP & WeaveHacks
Weights & Biases officially launched its MCP server: 20 schema-first tools so coding agents can read experiments and run autonomous research loops without blowing their context window. Plus WeaveHacks 4 returns June 6-7 in SF, with OpenAI sponsoring for the first time alongside Cursor, Redis and CopilotKit.
- W&B MCP server: 20 tools, agents query before pulling 300-metric runs
- WeaveHacks 4, June 6-7 SF β OpenAI, Cursor, Redis, CopilotKit
- $150 in API credits across Opus 4.8 and GPT-5.5
- CoreWeave Sandboxes now an official Harbor provider (runs Terminal-Bench)
ποΈ The Pope's AI Encyclical β Magnifica Humanitas
The crew goes deep on Pope Leo XIV's first encyclical, a 42,000-word document framed around the Tower of Babel versus rebuilding Jerusalem. Its core claim: AI is an anthropological problem, not a technical one. It's surprisingly pro-technology, open-source-pilled, and anti-autonomous-weapons β and Alex pushes back live on the worry that AI erodes our desire for human connection. A real debate on consciousness follows.
- Not a doomer document β 'technology is not inherently evil'
- Frames the choice as building Babel vs rebuilding Jerusalem
- Anthropic's Chris Olah was the featured tech speaker at the Vatican
- Pope names concentrated power in a few labs as a problem β open-source pilled
- Heated panel debate on whether models have experiences
π° Illinois SB315 β First US Frontier-AI Audit Law
Illinois passed SB315 unanimously, 110-0: the first US state law mandating independent third-party audits of frontier AI for catastrophic risk, with whistleblower protections and civil penalties. OpenAI publicly endorsed it, framing Illinois, California (SB53) and New York (RAISE Act) as converging into a de-facto national standard. The crew debates whether such rules entrench big labs over startups.
- Passed 110-0; OpenAI endorsed it
- Annual risk frameworks, third-party audits, transparency reports
- Whistleblower protection called the underrated hero of the bill
- Wolfram warns regulation is easier for incumbents than startups
π§ͺ DeepSWE β A Contamination-Free Coding Bench
Datacurve's DeepSWE is the first coding leaderboard in a while that matches how the models actually feel: 113 original tasks written from scratch, shipped as shallow clones with no git history to cheat from. Replaying older benches, they found SWE-Bench Pro's verifier is wrong ~32% of the time and that Claude Opus was reading the gold commit out of git history on 12-18% of passes.
- 113 original tasks, no scraped GitHub PRs, no git history to cheat
- GPT-5.5 leads at 70%, big drop-off after the top few
- Caught Claude reading the gold commit from git history
- Kimi K2 the top open-source entry
Hosts and Guests
Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-hosts - @WolframRvnwlf, @yampeleg, @nisten
AI & Society
Big CO LLMs + APIs
Open Source LLMs
OpenBMB releases MiniCPM5-1B, a new SOTA 1B open weights model for efficient local and on-device use (X, Hugging Face, Arxiv, X)
Tencent open-sources Hy-MT2 translation models under Apache 2.0, including a tiny 1.8B model that beats paid translation APIs (X, HF 1.8B, HF 30B-A3B, Arxiv)
Tools & Agentic Engineering
Google launches Universal Cart, AP2, and UCP to let AI agents shop and pay on your behalf (X)
Google AI Studio now lets anyone build native Android apps for free, with 250,000 apps created in the first week (X, AI Studio)
Cua Driver launches Windows support for background computer-use agents across real desktop apps (X, Blog, GitHub)
This Week's Buzz - from W&B and CoreWeave!
Vision & Video
Voice & Audio
MOSS-TTS-v1.5 ships as an 8B open-source TTS model with 31 languages, pause control, and Apache 2.0 licensing (X, Hugging Face, GitHub, Arxiv)
ElevenLabs launches Dubbing v2, an audio-to-audio model that preserves performance across 90+ languages (X, Dubbing, Creative, Productions)
Cartesia Ink-2 debuts as the most accurate streaming speech-to-text model on Artificial Analysis's new STT leaderboard (X, Ink, Artificial Analysis)
AI Art & Diffusion & 3D
Pruna AI's P-Image-Upscale hits 128 megapixel outputs with fast, predictable pricing (X, Docs, Replicate)
PrismML releases 1-bit and Ternary Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation (X, Blog, Hugging Face, iOS App, Demo)
Microsoft's MAI-Image-2.5 jumps to #3 on the Arena text-to-image leaderboard (X, Announcement, Arena)