Fourteen lanyards, one table.
Broadcasting from the middle of the floor means guests get grabbed hallway-track style — some scheduled, one crashing the set on purpose. Every badge below opens a profile.
Eleven segments, zero dead air.
Fable 5 prepped this run sheet — and shuffled the guest order for no reason. Each segment links to the full chapter notes below.
-
SEG 01
🎪 LIVE from AI Engineer World's Fair
Broadcasting live from the floor so it feels like you're at the table — guests get grabbed hallway-track style.
-
SEG 02
🏢 Fable is back (and Sonnet 5 is… meh)
Restored globally July 1 after export controls lifted (June 12 pause) — with new cybersecurity classifiers; it prepped the show's entire run sheet.
-
SEG 03
🔓 Open source: LongCat-2.0 unmasked (Meituan's Owl Alpha)
LongCat-2.0: 1.6T MoE trained entirely on Chinese ASICs (no NVIDIA), 59.5 SWE-bench Pro, $0.038/M tokens with free cache hits.
-
SEG 04
🔩 The Etched ASIC debate
Etched announced its LLM ASICs — model weights physically on the chip for major speed and power gains.
-
SEG 05
🧩 Exo Labs launches local.ai (+ a surprise NVIDIA crash)
local.ai tracks best-model-for-your-hardware, cloud trade-offs, and cost vs. API tokens — early access + signup codes now.
-
SEG 06
🌞 GPT-5.6 with Dominik Kundel (OpenAI)
GPT-5.6 = Sol (frontier), Terra (~5.5 intelligence at half cost), Luna (small & fast) + new Ultra 'Max' reasoning mode.
-
SEG 07
💛 This Week's Buzz: W&B Aria goes GA
Aria went GA on Monday — an auto-research agent inside the W&B UI ('Just Ask Aria').
-
SEG 08
🐡 Sakana Fugu with Stefania Druga
Fugu is recursive, not a dumb dispatcher — it rewrites prompts and verifies outputs before picking a model.
-
SEG 09
✨ Google DeepMind: OmniFlash + NanoBanana 2 Lite
OmniFlash: first of the any-to-any Omni family — conversational multi-turn video editing (editing Elo 1087, $0.10/second, up to 10s) via the Interactions API.
-
SEG 10
💙 Darya Volkov's token-billionaire debut
Runs eight agents (each with sub-agents) operating her marketing agency, Geeks360 — two more added live on air.
-
SEG 11
🫶 Swyx closes: what this whole thing is
First AI Engineer: 500 at Hotel Nikko. This one: 7,200, sold out, sub-5% talk acceptance.
Episode Summary
ThursdAI broadcast live for two and a half hours from the middle of the AI Engineer World's Fair expo floor in San Francisco — 7,200 engineers, every major lab a sponsor, aisles with actual street names. The headline: Fable 5 is back after the ban saga (and Sonnet 5 landed 'meh'). Then a ThursdAI-record nine guests, back to back: Exo Labs' Alex Cheema and Sero launching local.ai (with a surprise NVIDIA crash by Nader Khalil), OpenAI's Dominik Kundel on GPT-5.6 (Sol/Terra/Luna) and Codex, W&B's Zubin Aysola on the Aria auto-research agent (This Week's Buzz), Sakana's Stefania Druga on Fugu, Google DeepMind's Philipp Schmid on OmniFlash and NanoBanana 2 Lite, Darya Volkov's on-air debut running her agency on eight agents, and Swyx closing on what AI Engineer has become.
In This Episode
- 🎪 LIVE from AI Engineer World's Fair
- 🏢 Fable is back (and Sonnet 5 is… meh)
- 🔓 Open source: LongCat-2.0 unmasked (Meituan's Owl Alpha)
- 🔩 The Etched ASIC debate
- 🧩 Exo Labs launches local.ai (+ a surprise NVIDIA crash)
- 🌞 GPT-5.6 with Dominik Kundel (OpenAI)
- 💛 This Week's Buzz: W&B Aria goes GA
- 🐡 Sakana Fugu with Stefania Druga
- ✨ Google DeepMind: OmniFlash + NanoBanana 2 Lite
- 💙 Darya Volkov's token-billionaire debut
- 🫶 Swyx closes: what this whole thing is
Hosts & Guests
By The Numbers
🔥 Breaking During The Show
🎪 LIVE from AI Engineer World's Fair
ThursdAI broadcasts for 2.5 hours from the middle of the Moscone expo floor — right next to the OpenAI booth, with a six-person crew. 7,200 engineers, every major lab a sponsor, aisles with actual street names. The vibe versus London ~85 days earlier: all systems go — agents, token factories, software factories, everyone chasing RSI.
- Broadcasting live from the floor so it feels like you're at the table — guests get grabbed hallway-track style.
- Contrast with AI Engineer London (~85 days prior): American crowd feels the acceleration, less conceptual.
- Alex: top-five day of all time — the show, his talk, Darya there, and Team USA beating Bosnia that night.
🏢 Fable is back (and Sonnet 5 is… meh)
The biggest story of the week: Fable-5 is back, roughly 82 days after Mythos was announced in London, and less restricted than feared. Fable prepped the entire run of show (and shuffled the guest order for no reason). Meanwhile Sonnet 5 dropped and underwhelmed — LDJ found it less token-efficient than Opus, Wolfram's early WolfBench read put it slightly under Opus 4.6 at higher cost, and Nisten thought it was fine for unimportant stuff.
- Restored globally July 1 after export controls lifted (June 12 pause) — with new cybersecurity classifiers; it prepped the show's entire run sheet.
- Peter burned ~100 Fable generations before anyone at Arena woke up.
- Sonnet 5: 'most agentic Sonnet yet' at intro $2/$10 pricing through Aug 31 — but the new tokenizer can burn up to 35% more tokens; WolfBench's one-run read is under Opus 4.6.
🔓 Open source: LongCat-2.0 unmasked (Meituan's Owl Alpha)
The open-source segment had one big reveal: Meituan disclosed LongCat-2.0, a 1.6-trillion-parameter MoE trained entirely on Chinese ASICs — no NVIDIA hardware — hitting 59.5 on SWE-bench Pro at $0.038 per million tokens with free cache hits. It turned out to be the model that had been running anonymously as 'Owl Alpha' (which Wolfram had already been enjoying), and it ranks among OpenRouter's top models by volume. The panel also touched ZAI's new ZCode, a GLM-5.2-based agentic coding environment. The bigger trend: Chinese open-weight models are now ~30% of global usage on major platforms, up from 1.2% eleven months ago.
- LongCat-2.0: 1.6T MoE trained entirely on Chinese ASICs (no NVIDIA), 59.5 SWE-bench Pro, $0.038/M tokens with free cache hits.
- It was 'Owl Alpha' all along — already a top OpenRouter model by volume before anyone knew whose it was.
- Chinese open-weight models are now ~30% of global usage, up from 1.2% eleven months ago; ZAI's ZCode also dropped.
🔩 The Etched ASIC debate
Etched — the 'weights etched into the silicon' ASIC company — finally announced its LLM chips, and it was the talk of the expo floor. The panel was split between excitement about what weights-on-chip means for speed and power draw, and hard-earned skepticism: Nisten pointed out that Taalas has actually shipped a working product while Etched's famous demo ran on eight NVIDIAs, and until real silicon shows up he isn't buying it.
- Etched announced its LLM ASICs — model weights physically on the chip for major speed and power gains.
- Nisten's counterpoint: Taalas has shipped working silicon; Etched's earlier demo ran on eight NVIDIAs.
- Floor consensus: huge if true, but this crowd wants chips in hands before belief.
🧩 Exo Labs launches local.ai (+ a surprise NVIDIA crash)
Alex Cheema and Sero (0xSero) came on fresh off announcing local.ai — a site that tracks the local-AI frontier: best model for your hardware, the performance trade vs. the cloud, whether it's cheaper than API tokens. Early access is live with codes for everyone who signs up; the Exo CLI ('vLLM for consumer devices, configs figured out for you') follows in weeks. Sero walked through REAP pruning — a GLM 5.2 prune hitting 71% on Terminal Bench 2.1 — and Nemotron-3 Ultra (550B) running on four Sparks. Then Nader Khalil from NVIDIA crashed the set to pull together an impromptu Local AI Summit.
- local.ai tracks best-model-for-your-hardware, cloud trade-offs, and cost vs. API tokens — early access + signup codes now.
- Exo CLI = 'vLLM for consumer devices, with the configs figured out for you' — shipping in a few weeks.
- Sero's REAP pruning: a GLM 5.2 prune hits 71% on Terminal Bench 2.1; Nemotron-3 Ultra (550B) runs on four Sparks.
- Nader Khalil (NVIDIA, ex-Brev.dev) crashed the set to organize a mid-conference Local AI Summit.
🌞 GPT-5.6 with Dominik Kundel (OpenAI)
Smoothest transition ever — local AI to OpenAI via the person behind GPT-OSS. Dominik broke down GPT-5.6 as three models: Sol (frontier), Terra (~5.5-level intelligence at half the cost), and Luna (small & fast), plus a new Ultra mode with a Max reasoning level and heavier sub-agent use. Headline: 5.6 Sol is coming to Cerebras at absurd speed — the same weights as the API model, not a distill. Also: the Codex app is five months old, 100% of OpenAI engineers use it, and a human still reviews every PR. The token-bank feature came from community feedback, and yes — there's a literal physical reset button behind the booth.
- GPT-5.6 = Sol (frontier), Terra (~5.5 intelligence at half cost), Luna (small & fast) + new Ultra 'Max' reasoning mode.
- 5.6 Sol coming to Cerebras at extreme speed — same weights as the API model, not a distill or 'Spark situation'.
- Codex app is 5 months old, 100% of OpenAI engineers use it — and a human still reviews every PR that lands.
💛 This Week's Buzz: W&B Aria goes GA
The sponsor corner — Weights & Biases from CoreWeave — and this week it was a real launch. Zubin Aysola came by with Aria, the auto-research agent that went GA on Monday. It lives in the W&B UI ('Just Ask Aria', top-right), reads your traces, and debugs your loss curves. In Zubin's talk, Aria read its own production traces and updated its own prompts — RSI shipping on shelves.
- Aria went GA on Monday — an auto-research agent inside the W&B UI ('Just Ask Aria').
- Reads your traces and debugs your loss curves in-product.
- On stage, Aria read its own production traces and rewrote its own prompts.
🐡 Sakana Fugu with Stefania Druga
We covered Fugu last week without realizing we had a friend inside the lab — so we fixed that. Stefania Druga went deep on the two ICLR papers behind it (Trinity + the conductor), why it's recursive rather than a dumb dispatcher — it rewrites prompts and verifies outputs before picking a model — and announced on the pod that Fugu now works in Codex and OpenCode. Plus routing between numerical models and fuzzy reasoning for typhoon prediction, a SHEEFs teaser, and a riff on Socratic AI for kids: answer machines make lazy kids; question machines make curious ones.
- Fugu is recursive, not a dumb dispatcher — it rewrites prompts and verifies outputs before picking a model.
- Announced on air: Fugu now works in Codex and OpenCode.
- Socratic AI for kids: answer machines make lazy kids; question machines make curious ones.
✨ Google DeepMind: OmniFlash + NanoBanana 2 Lite
A first for the show — Alex took his first-ever mid-stream bio break and Wolfram ran the interview solo. Philipp Schmid covered OmniFlash (the first of the Omni any-to-any family: 10-second video generation with precise conversational editing — 'make it daytime' and it redoes light, sky, and shadows) and NanoBanana 2 Lite (under 4 seconds per generation, starting at three cents per 1,000 images, quality above the original NanoBanana). The Interactions API also hit GA. Google is shipping.
- OmniFlash: first of the any-to-any Omni family — conversational multi-turn video editing (editing Elo 1087, $0.10/second, up to 10s) via the Interactions API.
- NanoBanana 2 Lite: sub-4-second generations starting at ~3¢ per 1,000 images, above original NanoBanana quality.
- Interactions API hit GA; Wolfram ran the whole interview solo.
💙 Darya Volkov's token-billionaire debut
After years of Alex mentioning her — girlfriend, then fiancée, then wife — listeners finally met Darya Volkov. She came to AI Engineer in her own right, walking the floor with the media crew, and earned her own token-billionaire badge: she runs eight agents (each with sub-agents; she installed two more that Alex found out about live on air) that operate her actual marketing agency, Geeks360 — client platforms, billing systems, built practically overnight. Her wishlist: agents that learn progressively so you can grow trust, and one unified brain instead of a new model to chase every week.
- Runs eight agents (each with sub-agents) operating her marketing agency, Geeks360 — two more added live on air.
- Earned her own token-billionaire badge on the floor.
- Wishlist: progressively-learning agents you can grow trust in, and one unified brain instead of chasing a new model weekly.
🫶 Swyx closes: what this whole thing is
We closed with the man who built the city. Some wild numbers: the first AI Engineer was 500 people at Hotel Nikko; this one was 7,200, sold out, sub-5% talk acceptance, a daily printed newspaper, a puppy corner, a flash mob, and a token-billionaire lounge. A month out only 3,000 tickets were sold. Swyx calls the conference 'the highest loop — the one that creates all the other loops,' and the expansion is real, with AIE Tokyo next. On the record: ThursdAI got its official start because Swyx was the first person to believe in Alex.
- First AI Engineer: 500 at Hotel Nikko. This one: 7,200, sold out, sub-5% talk acceptance.
- Daily printed newspaper, puppy corner, flash mob, token-billionaire lounge — and AIE Tokyo is next.
- Swyx was the first person to believe in Alex — 'the highest loop, the one that creates all the other loops.'
ThursdAI broadcast 2.5 hours live from the AI Engineer World's Fair expo floor — 7,200 engineers, every lab a sponsor, and a ThursdAI-record nine guests.
🏢 Big CO LLMs + APIs
- Fable 5 is back — Anthropic restored the model globally on July 1 after US export controls were lifted, with new cybersecurity classifiers as safeguards. The June 12 pause had affected both Fable 5 and Mythos 5; access resumed without ID verification, though new content filters may temporarily block some routine coding tasks.
- Claude Sonnet 5 — 'our most agentic Sonnet yet', near-Opus 4.8 performance at introductory $2/$10 pricing through August 31. Reception split: power users saw near-Opus costs for slightly inferior output at high effort, casual users liked the value. The new tokenizer may consume up to 35% more tokens.
- Steganographic fingerprinting disclosure — Anthropic acknowledged a March 2026 experiment embedding hidden signals in Claude Code's system prompt targeting Chinese proxy users. Dormant for custom-endpoint users only, no separate exfiltration channel — but the obfuscated approach raised trust questions about agent access.
🔓 Open Source LLMs
- LongCat-2.0 revealed — Meituan's 1.6T-parameter MoE, trained entirely on Chinese ASICs without NVIDIA hardware: 59.5 SWE-bench Pro at $0.038/M tokens with free cache hits. It had been running anonymously as 'Owl Alpha' and ranks among OpenRouter's top models by volume.
- Chinese open-weight models are now ~30% of global usage on major platforms, up from 1.2% eleven months ago.
- local.ai launched live on the show — Exo Labs' tracker for the local-AI frontier: best model for your hardware, cloud trade-offs, and cost vs. API tokens. Early access codes for signups; Exo CLI ships in weeks.
💛 This Week's Buzz
- W&B Aria went GA — the auto-research agent in the W&B UI that reads your traces and debugs your loss curves. In Zubin's AIE talk it read its own production traces and updated its own prompts.
- Alex's AIE talk: 'Should AI Engineers still read code in 2026?' — decomposition, verification, and engineering principles.
🤖 AI Coding & Agents
- ZCode — Z.ai's agentic coding environment on GLM-5.2: 1M-token context, a /goal verification protocol with independent success checkers, 173 tok/s output and 1.4s time-to-first-token.
- Base 1 — Base44 (Wix, $150M ARR) launched a proprietary LLM trained on tens of millions of real app-building interactions — the first vibe-coding platform with an internal model, already auto-routing tasks.
- GPT-5.6 — Sol (frontier), Terra (~5.5-level at half cost), Luna (small & fast), plus an Ultra mode with Max reasoning. 5.6 Sol is coming to Cerebras at extreme speed — same weights, not a distill.
🎵🎬 Voice & Vision
- NanoBanana 2 Lite — image generations in under 4 seconds starting at ~3¢ per 1,000 images, above original NanoBanana quality.
- Gemini OmniFlash — conversational multi-turn video editing via the Interactions API (now GA): editing Elo 1087, $0.10/second for videos up to 10 seconds.