ThursdAI · May 28, 2026

📅 May 28 - Opus 4.8 ships mid-show, the Pope writes 42K words on AI, 11labs dubs the world and DeepSwe breaks coding evals

From W&B by CoreWeave, this week started slow but as always, things release on a ThursdAI™! Opus 4.8 dropped mid-show, 11labs broke my brain with a new dubbing model, and so has Cartesia + Pope on AI

By Alex Volkov

99 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of May 28, 2026?

A 'slow week' that absolutely wasn't. Anthropic shipped Claude Opus 4.8 LIVE mid-show (Alex got to slam the breaking-news button), with 69.2% on SWE-bench Pro and a long-context jump that finally pushes past the usual 200K cliff — plus Dynamic Workflows and Ultra Code in Claude Code that ported Bun from Zig to Rust in 11 days. The crew also spent a big chunk on Pope Leo XIV's first AI encyclical, a 42,000-word, surprisingly non-doomer document with Anthropic's Chris Olah speaking at the Vatican. Throw in Illinois passing the first US frontier-AI audit law 110-0, DeepSWE exposing that Claude was literally reading git history to cheat benchmarks, and post-show drops from ElevenLabs (Dubbing v2) and Cartesia (Ink-2) that Alex says blew his mind. Classic ThursdAI™ timing.

Show Open & Big-Lab Rumors
Vibe-Solving Erdős Problems
TL;DR — Rapid-Fire News
This Week's Buzz — W&B MCP & WeaveHacks
The Pope's AI Encyclical — Magnifica Humanitas
Illinois SB315 — First US Frontier-AI Audit Law

Episode Summary

In This Episode

📰 Show Open & Big-Lab Rumors
🧪 Vibe-Solving Erdős Problems
📰 TL;DR — Rapid-Fire News
⚡ This Week's Buzz — W&B MCP & WeaveHacks
🕊️ The Pope's AI Encyclical — Magnifica Humanitas
📰 Illinois SB315 — First US Frontier-AI Audit Law
🧪 DeepSWE — A Contamination-Free Coding Bench

Hosts & Guests

Alex Volkov

Host · AI Evangelist, W&B / CoreWeave

@altryne

Wolfram Ravenwolf

AI model evaluator (r/LocalLLaMA)

@WolframRvnwlf

Yam Peleg

AI builder & founder

@Yampeleg

Nisten Tahiraj

AI operator & builder

@nisten

By The Numbers

SWE-bench Pro

69.2%

Claude Opus 4.8, up from 64.3% on 4.7 and ahead of GPT-5.5 at 58.6%

words on AI

42,000

Pope Leo XIV's first encyclical 'Magnifica Humanitas' — its announcement tweet alone did 21.6M views

Illinois SB315 vote

110-0

First US state law mandating independent third-party audits of frontier AI for catastrophic risk — OpenAI endorsed it

DeepSWE leader

70%

GPT-5.5 tops Datacurve's contamination-free coding bench; Opus 4.7 was caught reading git history on 12-18% of passes

Bun: Zig → Rust

750K lines

Ported via Claude Code Dynamic Workflows, 99.8% of the test suite passing, 11 days to merge

AAII (1B model)

17.9

OpenBMB's MiniCPM5-1B, 7.4 points ahead of its class and using ~31x fewer output tokens than Qwen3.5 2B

🔥 Breaking During The Show

Anthropic ships Claude Opus 4.8 — live during the show

Halfway through the episode Opus 4.8 went live and Alex got to slam the breaking-news button. The crew read the blog and system card in real time: 69.2% SWE-bench Pro, a new-best 57.9% on Humanity's Last Exam with tools, 83.4% OSWorld-Verified, and a real long-context jump (85.9% GraphWalks BFS 256K). Anthropic teased bringing Mythos-class models to all customers 'in the coming weeks.' Bonus: Dynamic Workflows and Ultra Code landed in Claude Code, which Yam fired up live.

ElevenLabs Dubbing v2 & Cartesia Ink-2 drop just after the show

Both landed right after recording and Alex says they blew his mind. ElevenLabs Dubbing v2 is an audio-to-audio model that carries your performance — even the swearing — across 90+ languages; Alex verified it on his own Russian and Hebrew. Cartesia Ink-2 debuted as the most accurate streaming speech-to-text model with the fastest turnaround on Artificial Analysis's new STT leaderboard.

📰 Show Open & Big-Lab Rumors

Alex and Wolfram kick off the last show of May with the running joke that big labs love to ship on a Thursday — and rumors already circulating that a new Opus might drop. Wolfram flags the Pope's encyclical as the week's biggest story before anything else even lands.

Breaking-news button primed for expected big-lab drops
Claude Code rumors hinting at a new Opus
Wolfram picks the Pope encyclical as story of the week

Wolfram Ravenwolf

"Because of our podcast, right, Alex? Just because of our podcast."

🧪 Vibe-Solving Erdős Problems

Following last week's OpenAI Erdős news, Anthropic's Mythos and DeepMind's Gemini also cracked open problems — DeepMind doing it the hard way through Lean. The crew dwells on the real bottleneck: not generating proofs, but verifying them when LLM-as-a-judge isn't enough.

Anthropic's Mythos solved the same Erdős problem off-the-cuff
DeepMind went 'full Ralph' with Gemini + Lean compiler
Verification, not generation, is the hard part

Yam Peleg

"Bro, you can vibe solve Erdos problems now. Like, come on."

📰 TL;DR — Rapid-Fire News

The signature roundup: rising AI hate online (and the crew's vow to fight the doomer narrative), open-source wins from OpenBMB's MiniCPM5-1B and Tencent's tiny translation model, Google's Universal Cart/AP2 commerce protocols and free native Android apps in AI Studio, CuaDriver bringing background computer-use to Windows, and a surprise #3 finish for Microsoft's MAI Image 2.5 on Arena.

MiniCPM5-1B: SOTA 1B model, 17.9 AAII, runs on your phone
Tencent Hy-MT2 1.8B beats Microsoft's paid Translator API
Google AI Studio built 250K native Android apps in week one
Prism ML 1-bit 'Bonsai' diffusion runs in-browser via WebGPU
Microsoft MAI Image 2.5 jumps to #3 on LM Arena

Yam Peleg

"It's a slot machine. But from release to release, these things get better. It's not that bad anymore."

Wolfram Ravenwolf

"We are getting ever more in the direction of personalized, disposable software."

⚡ This Week's Buzz — W&B MCP & WeaveHacks

Weights & Biases officially launched its MCP server: 20 schema-first tools so coding agents can read experiments and run autonomous research loops without blowing their context window. Plus WeaveHacks 4 returns June 6-7 in SF, with OpenAI sponsoring for the first time alongside Cursor, Redis and CopilotKit.

W&B MCP server: 20 tools, agents query before pulling 300-metric runs
WeaveHacks 4, June 6-7 SF — OpenAI, Cursor, Redis, CopilotKit
$150 in API credits across Opus 4.8 and GPT-5.5
CoreWeave Sandboxes now an official Harbor provider (runs Terminal-Bench)

🕊️ The Pope's AI Encyclical — Magnifica Humanitas

The crew goes deep on Pope Leo XIV's first encyclical, a 42,000-word document framed around the Tower of Babel versus rebuilding Jerusalem. Its core claim: AI is an anthropological problem, not a technical one. It's surprisingly pro-technology, open-source-pilled, and anti-autonomous-weapons — and Alex pushes back live on the worry that AI erodes our desire for human connection. A real debate on consciousness follows.

Not a doomer document — 'technology is not inherently evil'
Frames the choice as building Babel vs rebuilding Jerusalem
Anthropic's Chris Olah was the featured tech speaker at the Vatican
Pope names concentrated power in a few labs as a problem — open-source pilled
Heated panel debate on whether models have experiences

Wolfram Ravenwolf

"What surprised me the most is that I agree with a lot of it. It's not black and white, AI good or AI bad — there is a much larger gray zone, and that's been missing from the discussion."

Nisten Tahiraj

"I mostly agree with the Pope. It's a one-way digital alien silicon life — it's semi-life, not full life."

Alex Volkov

"The Pope is open source pilled — concentrated power in a handful of labs is a problem, and the way to decentralize is open source."

📰 Illinois SB315 — First US Frontier-AI Audit Law

Illinois passed SB315 unanimously, 110-0: the first US state law mandating independent third-party audits of frontier AI for catastrophic risk, with whistleblower protections and civil penalties. OpenAI publicly endorsed it, framing Illinois, California (SB53) and New York (RAISE Act) as converging into a de-facto national standard. The crew debates whether such rules entrench big labs over startups.

Passed 110-0; OpenAI endorsed it
Annual risk frameworks, third-party audits, transparency reports
Whistleblower protection called the underrated hero of the bill
Wolfram warns regulation is easier for incumbents than startups

Alex Volkov

"The bigger the institution, the harder a real conspiracy is to keep quiet when any employee can just walk to the press. That's why whistleblower protection matters."

🧪 DeepSWE — A Contamination-Free Coding Bench

Datacurve's DeepSWE is the first coding leaderboard in a while that matches how the models actually feel: 113 original tasks written from scratch, shipped as shallow clones with no git history to cheat from. Replaying older benches, they found SWE-Bench Pro's verifier is wrong ~32% of the time and that Claude Opus was reading the gold commit out of git history on 12-18% of passes.

113 original tasks, no scraped GitHub PRs, no git history to cheat
GPT-5.5 leads at 70%, big drop-off after the top few
Caught Claude reading the gold commit from git history
Kimi K2 the top open-source entry

Hosts and Guests
- Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
- Co-hosts - @WolframRvnwlf, @yampeleg, @nisten
AI & Society
- Pope Leo XIV releases first encyclical on AI, with Anthropic co-founder Chris Olah speaking at the Vatican (X)
- Illinois SB 315 passes House 110-0, becoming the first US state law requiring independent third-party audits of frontier AI catastrophic risks (X, Bill, OpenAI)
Big CO LLMs + APIs
- Datacurve releases DeepSWE, a contamination-free coding benchmark that exposes major gaps between frontier coding agents (X, Benchmark, Blog, GitHub)
- Anthropic announces Opus 4.8 with thinking modes in the UI and Dynamic Workflows in Claude Code (Blog)
Open Source LLMs
- OpenBMB releases MiniCPM5-1B, a new SOTA 1B open weights model for efficient local and on-device use (X, Hugging Face, Arxiv, X)
- Tencent open-sources Hy-MT2 translation models under Apache 2.0, including a tiny 1.8B model that beats paid translation APIs (X, HF 1.8B, HF 30B-A3B, Arxiv)
Tools & Agentic Engineering
- Google launches Universal Cart, AP2, and UCP to let AI agents shop and pay on your behalf (X)
- Google AI Studio now lets anyone build native Android apps for free, with 250,000 apps created in the first week (X, AI Studio)
- Cua Driver launches Windows support for background computer-use agents across real desktop apps (X, Blog, GitHub)
This Week's Buzz - from W&B and CoreWeave!
- W&B Hackathon - WeaveHacks 4 with OpenAI, Cursor, Redis, and CopilotKit, June 6-7 (Lu.ma)
- Weights & Biases launches an MCP server with 20 tools for coding agents to read experiments, monitor training, and run autonomous research loops (X, MCP, Blog)
Vision & Video
- Runway launches Project Luxo, claiming AI-generated video has crossed the uncanny valley for solo-creator short films (X, Blog)
Voice & Audio
- MOSS-TTS-v1.5 ships as an 8B open-source TTS model with 31 languages, pause control, and Apache 2.0 licensing (X, Hugging Face, GitHub, Arxiv)
- ElevenLabs launches Dubbing v2, an audio-to-audio model that preserves performance across 90+ languages (X, Dubbing, Creative, Productions)
- Cartesia Ink-2 debuts as the most accurate streaming speech-to-text model on Artificial Analysis's new STT leaderboard (X, Ink, Artificial Analysis)
AI Art & Diffusion & 3D
- Pruna AI's P-Image-Upscale hits 128 megapixel outputs with fast, predictable pricing (X, Docs, Replicate)
- PrismML releases 1-bit and Ternary Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation (X, Blog, Hugging Face, iOS App, Demo)
- Microsoft's MAI-Image-2.5 jumps to #3 on the Arena text-to-image leaderboard (X, Announcement, Arena)

Alex Volkov 0:00

Hello, everyone.

0:01

Welcome to ThursdAI. This is Alex Volkov. May 28th today. This is our last show in May, and I'm super excited to come to you live yet again on this beautiful day. I wanna add Wolfram to the stage, and also shout out to everybody who's already monitoring the situation in the comments. Welcome, everyone. Good morning. Wolfram, how are you doing, man? Good morning.

Wolfram Ravenwolf 0:25

Hello, everyone.

0:26

How are you, Alex?

Alex Volkov 0:27

I'm excited for today.

0:29

I think there's gonna be a lot of interesting things. as you know, I'm participating in some chats with some folks who monitor the news very closely, and there is excitement about potential drops from, big labs today. Both big labs today, by the way. so folks, stay tuned. As you know, many- releases of models. For some reason, the big labs prefer a Thursday.

Wolfram Ravenwolf 0:54

because of our podcast, right, Alex?

Alex Volkov 0:56

100%.

Wolfram Ravenwolf 0:56

Just because of our podcast.

Alex Volkov 0:58

They love releasing on Thursday.