Episode Summary
ThursdAI goes LIVE from AI Engineer Europe in London with five back-to-back interviews โ Swyx on harness engineering and why MCP is now 'less interesting because it's stable,' Peter Gostev on Anthropic's Claude Mythos ($25/$125 per M tokens, 77% on SWE-bench Pro, 'too dangerous to release'), VB from OpenAI on Codex hitting 3M weekly active users with plugins and Guardian Approvals, Vincent Koc on OpenClaw's 1.5M-line plugin rearchitecture and the /dreaming feature, and Omar Sanseviero on Gemma 4 crossing 10M downloads. Plus Meta's Muse Spark drops mid-show, Seedance 2.0 lands stateside, and Anthropic's ARR jumps from $19B to $30B in two months.
In This Episode
- ๐ค Swyx โ AI Engineer, Harness Engineering & Skills
- ๐ฐ TL;DR โ Weekly News Roundup
- ๐ข Peter Gostev โ Mythos, Arena Data & Compute Wars
- โก This Week's Buzz
- ๐ ๏ธ VB (OpenAI) โ Codex, Plugins & Guardian Approvals
- ๐ค Vincent Koc โ OpenClaw, Dreaming & 1.5M Lines of Code
- ๐ Omar Sanseviero โ Gemma 4, 10M Downloads & Google DeepMind
- ๐ฌ Wrap-Up
Hosts & Guests
By The Numbers
๐ฅ Breaking During The Show
๐ค Swyx โ AI Engineer, Harness Engineering & Skills
Opening live from AI Engineer Europe, Swyx runs down the tracks he curated this year: coding agents, MCP ('settled and therefore less interesting'), generative media split from voice/vision, and harness engineering. He warns engineers to 'vendor everything' after the light LLM and Axios supply-chain compromises โ pip fork instead of pip install.
- Harness engineering: big labs are investing more, not less โ it's not going away
- Skills are absorbing harness work โ English as the ultimate programming interface
- Supply-chain security: vendor everything, 'pip fork' not 'pip install'
- MCP has beaten OpenAPI, tRPC, gRPC as the integration protocol
๐ฐ TL;DR โ Weekly News Roundup
The usual fast-lap through the week: Mythos, Muse Spark, GLM-5.1, Gemma 4 at 10M, Seedance 2.0 hitting the US, HappyHorse mystery video model, the surreal Mila Jovovich 'MemPalace' saga, OpenClaw 2026.4.5 with /dreaming, Anthropic Managed Agents, and a W&B Automations launch. Peter Gostev rides along.
- Claude Mythos announced โ too dangerous to release publicly
- Anthropic $30B ARR (up from $19B in Feb), secondary tender sale completed
- Meta Muse Spark โ first model from Meta Superintelligence Labs
- GLM-5.1: #1 open-source on SWE-Bench Pro at 58.4%
- Seedance 2.0 live on Replicate โ ~80 ELO points above the next video model on Arena
- OpenClaw 2026.4.5 ships /dreaming (REM/Light/Deep memory consolidation)
๐ข Peter Gostev โ Mythos, Arena Data & Compute Wars
Peter Gostev from Arena joins to break down Claude Mythos. The numbers: 77% on SWE-bench Pro (up from 53%), $25/$125 per M tokens, released only to ~40 partner companies under Project Glasswing. His framing: the 'too dangerous' framing is mostly cover โ Anthropic likely just doesn't have the compute to serve it. He walks through his updated Compute Wars chart showing Anthropic catching up via the new Google TPU deal, and announces Arena has released 3 years of historical leaderboard data and actual prompts as Hugging Face datasets.
- Mythos: 77% SWE-bench Pro, 64% HLE, +10pts on browser comp โ 10T parameter rumor
- $25 / $125 per M tokens โ ~5x Opus 4.6
- Real reason it's unreleased: compute shortage, not safety
- Anthropic ARR: $19B (Feb) โ $30B (Apr), secondary tender sale completed pre-IPO
- Google TPU deal helps Anthropic catch up โ weird that Google is propping up a DeepMind competitor
- Arena released 3 years of historical leaderboard + prompt datasets on Hugging Face
โก This Week's Buzz
W&B ships Automations โ event-triggered actions from your training runs into notifications, GitHub Actions, and deployments, pairing nicely with the new iOS app. GLM-5.1 and Gemma 4 both go live on W&B Inference. Wolfram's in-depth blog post on why more reasoning isn't always better is up on wandb.com.
- W&B Automations launch โ triggers on runs โ Slack, GitHub Actions, deploys
- GLM-5.1 and Gemma 4 both live on W&B Inference
- Wolfram's reasoning-regression deep dive published on the W&B blog
๐ ๏ธ VB (OpenAI) โ Codex, Plugins & Guardian Approvals
Vaibhav (VB) Srivastav from OpenAI's Codex team walks through the 3M weekly-active-users milestone and what's behind it: plugins (Stripe, Supabase, shadcn), sub-agents, and new experimental features including Guardian Approvals โ a sub-agent that classifies every tool call for risk and only escalates the dangerous ones. VB also shares his own workflow: every morning at 9 AM, a Codex automation reads his Slack mentions, cross-references Gmail and calendar, and drops pre-briefs into five-minute calendar events.
- Codex: 3M weekly active users, up from 2M last month
- Plugins are bundles of skills + MCP servers (iOS builds, web builds, Stripe, Supabase, shadcn)
- Sub-agents decompose tasks into independent parallel Codex agents (including 'Jason')
- Guardian Approvals: sub-agent risk-classifies every tool call; auto-approves low risk, escalates high risk
- Experimental hooks: run code at session start, after tool calls, at session end
- VB's daily automation: reads Slack mentions, cross-refs Gmail + Calendar, schedules 5-min pre-brief events
๐ค Vincent Koc โ OpenClaw, Dreaming & 1.5M Lines of Code
OpenClaw's #2 maintainer Vincent Koc on refactoring 1.5M lines of code into a plugin architecture in nine days (at 2 AM, at NVIDIA, before Jensen's keynote), the new /dreaming feature (REM/core/deep-sleep memory consolidation with a human-readable Dream Log), and why GPT-5.4 feels 'soulful-less' compared to the Rocky-from-Project-Hail-Mary voice Alex cloned for his own OpenClaw. Plus the pricing truth: Anthropic didn't ban OpenClaw โ they just made Max-tier Opus via OpenClaw significantly more expensive.
- OpenClaw codebase: ~1.5M lines incl. unreleased iOS + Android native apps
- GitHub PR/issue counter literally caps at '5K+' โ they hit the ceiling
- Plugin architecture ('not Lego โ Ikea') refactored in 9 days at NVIDIA pre-keynote
- /dreaming: REM / core / deep sleep phases that defrag agent memory into a human-readable Dream Log
- Anthropic didn't ban OpenClaw โ made Max-tier Opus usage via OpenClaw expensive, pushing users to GPT-5.4 via Codex
- GPT-5.4 'soulless' personality; Alex cloned Rocky's voice from Project Hail Mary into his OpenClaw
๐ Omar Sanseviero โ Gemma 4, 10M Downloads & Google DeepMind
Omar Sanseviero from Google DeepMind on the Gemma 4 launch crossing 10M downloads and 1,000+ fine-tunes, the license change, and the AI Edge gallery that lets people run Gemma locally on Android/iOS. Gemma is now the foundation for the next generation of Gemini Nano shipping on Pixel and Samsung. Wolfram chimes in on his Google-first household โ kids using AI Studio and Antigravity to build games, his 70-year-old mother unlocking her Pixel by voice.
- Gemma 4: 10M+ downloads, 1,000+ fine-tunes on Hugging Face
- Gemma family total: over 500M downloads across all variants
- Gemma is the foundation for next-gen Gemini Nano on Pixel / Samsung
- AI Edge gallery โ run Gemma locally on Android & iOS
- Llama.cpp vision capability fixes shipping
- Ask-your-mom test: Wolfram's 70-year-old mom uses voice unlock on Pixel
๐ฌ Wrap-Up
Alex signs off from London with Wolfram, and hands the after-show over to Yam, Nisten, and LDJ on a Twitter Space. Gratitude lap to Swyx and the AI Engineer crew โ ThursdAI literally wouldn't exist without the AI Engineer conference. See you in San Francisco this summer.
- AI Engineer Europe sold out 4x faster than any other edition
- Post-show continues on Twitter Spaces with Yam / Nisten / LDJ
- Next AI Engineer is this summer in San Francisco
Hosts and Guests
Alex Volkov โ AI Evangelist & Weights & Biases (@altryne)
Co-Hosts โ @WolframRvnwlf @yampeleg @nisten @ldjconfirmed
Guests: @swyx (AI Engineer / Latent Space), @petergostev (Arena, formerly LMArena), @reach_vb (OpenAI / Codex), @vincent_koc (OpenClaw #2 maintainer), @osanseviero (Google DeepMind / Gemma)
Big CO LLMs + APIs
Anthropic announces Project Glasswing and Claude Mythos Preview, a cyber-defense frontier model too dangerous to release publicly (X, Announcement)
Anthropicโs Claude Mythos is so powerful they wonโt release it โ found zero-days in every major OS and browser, escaped its sandbox, and scored 93.9% on SWE-bench (X, X, X, X)
Anthropic ARR jumps from $19B (February) to $30B in April โ secondary tender sale completed, employees not selling ahead of IPO
Anthropic + Google TPU deal โ Anthropic getting massive compute commitment from Google (who already owns ~10% of Anthropic), with Peter Gostevโs Compute Wars chart showing the gap to OpenAI closing
Anthropic ships Managed Agents โ fully hosted agent runtime + infrastructure. Selling outcomes, not tokens
Meta launches Muse Spark, the first model from Meta Superintelligence Labs, with natively multimodal reasoning, multi-agent Contemplating mode, and deep health/visual capabilities (X, Blog)
Simon Willison deep dives into Metaโs Muse Spark model and uncovers 16 hidden tools including visual grounding and sub-agents in the meta.ai chat UI (X, Blog, Announcement)
Open Source LLMs
GLM-5.1 from Z.ai is #1 open-source on SWE-Bench Pro at 58.4%, runs autonomously for 8 hours with 1,700+ agent steps (X, HF, Arxiv)
Gemma 4 crosses 10M+ downloads, 1,000+ Gemma-4-based fine-tunes on HF. Did really well on Arena considering size โ Peter Gostev confirmed it smashed many models on the Pareto curve
Nistenโs pick: Hermes 27B โ trained specifically to be paired with the Hermes harness, allegedly distilled from Opus API. Model + harness shipped together as a portable unit
Tools & Agentic Engineering
OpenClaw 2026.4.5 โ biggest release since 4.0:
/dreaminggoes GA (Light/Deep/REM memory consolidation with a Dream Diary in DREAMS.md), built-in video + music generation across 4 backends, GPT-5.4 as new default, prompt-cache reuse improvements, Control UI + docs in 12 new languages (Release, Vincent, Dreaming docs, FOD#147)OpenClaw codebase now ~1.5M lines including unreleased iOS + Android native apps. GitHub literally caps at โ5K+โ PRs/issues โ they hit the ceiling
Anthropic did NOT ban OpenClaw โ they made Max-tier subscription usage of Opus via OpenClaw significantly more expensive, pushing many users to GPT-5.4 via Codex
Codex hits 3M weekly active users โ up from 2M last month. VB walked through plugins (Stripe, Supabase, shadcn), sub-agents, Guardian Approvals (auto-classify tool-call risk), and experimental hooks
Cursor: remote agents + code review agent (78% issues caught pre-merge)
MemPalace: Milla Jovovich and Ben Sigmanโs open-source AI memory system goes viral with 26K GitHub stars in 2 days, claims top benchmark scores, then transparently walks back overstated claims (X, GitHub, X, X, GitHub)
This Weekโs Buzz (Weights & Biases)
W&B Automations are LIVE โ event triggers from your runs into notifications, GitHub Actions, deployments. Pairs nicely with the new iOS app
GLM-5.1 and Gemma 4 both up on W&B Inference
Wolfram published an in-depth blog post on his finding that more reasoning is not always better (models can get dumber with more thinking time) โ full writeup on wandb.com
Vision & Video
Seedance 2.0 launches in the US โ on Replicate with up to 9 reference images, 3 videos, and 3 audio files for cinematic AI video generation (X, Announcement). Peter Gostev confirmed it jumped ~80 ELO points above the next video model on Arena โ a massive gap where most video models cluster within 10 points
HappyHorse-1.0, a mysterious 15B video model from Alibabaโs Taotian Group, takes #1 on Artificial Analysis video arena beating Seedance 2.0, Kling 3.0, and Grok Video (X, X, X, X, Blog)
The Harry Potter โDrip Wizardsโ AI slop trend โ Seedance-powered Hogwarts videos going hugely viral
AI Art & Diffusion & 3D
Show notes & key moments
Swyx on harness engineering: gains are coming from the harness, not the weights. The big labs are investing more and more in harness โ itโs not going away. Skills (English-as-programming-language) are increasingly absorbing harness work
Swyx on AI Engineer tracks: MCP is โmore settled and stable, therefore less interesting.โ Coding agents track is bigger this year (Cursor, Factory, super-long-running). Voice & Vision split from Generative Media โ multimodality as a single track no longer makes sense
Swyx on supply chain attacks: light LLM and Axios issues mean you should โvendor everythingโ โ
pip forkinstead ofpip install. Tool requests becoming prompt requestsPeter Gostev on Mythos pricing: $25 / $125 per M tokens (~5x Opus 4.6). But the real reason itโs not public isnโt safety โ Anthropic likely just doesnโt have the compute to serve it
Peter Gostev on Compute Wars: OpenAI is way ahead of Anthropic on compute. The new Google TPU deal is Anthropic catching up โ and weird that Google is propping up a competitor to DeepMind. (Same pattern as when Googleโs $2B Anthropic investment effectively propped up AWS vs Google Cloud)
Peter Gostev on Arena data: Arena released 3 years of historical leaderboard data + actual prompts as datasets on Hugging Face. Previously he was scraping it by hand into Google Sheets โ now he has Databricks access
VB on Codex workflows: every morning at 9 AM, Codex automation reads his Slack mentions, cross-references Gmail and Calendar, and creates a 5-minute pre-brief calendar event for upcoming meetings. None of it is โcodingโ โ itโs all plugins + connectors
Vincent Koc on the GPT-5.4 personality problem: model is incredible at coding but โsoulless.โ Wolfram noticed it back in December and cancelled his subscription. Alex cloned the Rocky voice from Project Hail Mary and put it in his OpenClaw โ โamazingโ
Vincent Koc on Dreaming: three phases (REM, core, deep sleep) that defrag agent memory. The dream log is for the human in the loop โ makes memory inspectable in a way a non-technical person (a mom) can understand
Vincent Koc on architecture: the open-source flood forced OpenClaw into a plugin architecture. โNot Lego โ Ikea.โ Refactored ~1M lines in 9 days at 2 AM at NVIDIA before Jensenโs keynote
Omar Sanseviero on Gemma 4: 500M+ total Gemma downloads across all variants. Gemma is the foundation for the next generation of Gemini Nano on Pixel/Samsung. Lama.cpp vision capability fixes shipping
Wolframโs Pixel/Google household: kids using AI Studio + Antigravity to build games, his 70-year-old mother using voice unlock on her Pixel