ThursdAI · November 20, 2025

📆 ThursdAI - the week that changed the AI landscape forever - Gemini 3, GPT codex max, Grok 4.1 & fast, SAM3 and Nano Banana Pro

From Weights & Biases - this is by far the craziest AI week since we started the show + 3 interviews with guests from Cognition, OpenAI, Deepmind and live recording from the floor of AI Engineer!

By Alex Volkov

89 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of November 20, 2025?

Recorded live from the AI Engineer Summit in New York, this might be the most packed ThursdAI episode ever — in a single week, Google dropped Gemini 3 Pro (45% on ARC-AGI-2!), xAI shipped Grok 4.1 then Grok 4.1 Fast with a full Agent Tools API, OpenAI answered with GPT-5.1-Codex-Max capable of 24-hour+ coding runs, and Meta segmented the universe with SAM 3 and SAM 3D. Oh, and Google capped Thursday itself with Nano Banana Pro generating flawless 4K infographics while Alex was still live on air. Three incredible guests joined — Swyx from Cognition/Latent Space who organized the summit, Thor Schaeff from Google DeepMind (on day three of his new job!), and Dominik Kundel from OpenAI breaking down Codex's native compaction magic. The future didn't just arrive — it showed up with luggage.

Live from AI Engineer: The Craziest Week in AI
AI Engineer Summit: Coding Agents Take Center Stage
Gemini 3 Pro: Google's AI Comeback is Complete
Antigravity: Google's Free Agentic IDE That Feels Like the Future
GPT-5.1-Codex-Max: 24-Hour Agent Runs and Native Compaction
Grok 4.1 Fast & Agent Tools API: xAI's Developer Moment

Episode Summary

In This Episode

🎙️ Live from AI Engineer: The Craziest Week in AI
🏛️ AI Engineer Summit: Coding Agents Take Center Stage
💎 Gemini 3 Pro: Google's AI Comeback is Complete
🚀 Antigravity: Google's Free Agentic IDE That Feels Like the Future
🤖 GPT-5.1-Codex-Max: 24-Hour Agent Runs and Native Compaction
⚡ Grok 4.1 Fast & Agent Tools API: xAI's Developer Moment
🍌 Nano Banana Pro: 4K Image Generation with Perfect Text
🔬 Meta SAM 3 & SAM 3D, OLMo 3, and Open Source News

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Swyx

Latent Space / Cognition — Founder & AI Engineer Conference

@swyx

Thor Schaeff

Google DeepMind — Developer Experience Engineer

@thorwebdev

Dominik Kundel

OpenAI — Developer Experience & SDKs

@dkundel

Ryan Carson

AI educator & founder

@ryancarson

Yam Peleg

AI builder & founder

@Yampeleg

Wolfram Ravenwolf

Weekly co-host, AI model evaluator

@WolframRvnwlf

LDJ

Nous Research

@ldjconfirmed

Nisten Tahiraj

AI operator & builder

@nisten

By The Numbers

ARC-AGI-2

45.14%

Gemini 3 Deep Think — enormous jump, roughly double previous SOTA on this reasoning benchmark

τ²-Bench Telecom

93–100%

Grok 4.1 Fast on agentic customer-service simulation at ~10× cheaper than competitors

TerminalBench 2

58%

GPT-5.1-Codex-Max — new SOTA on the benchmark just days after its launch

Agent run time

24h+

GPT-5.1-Codex-Max native compaction lets it work on a single task for a day — or more

Nano Banana Pro

First image model to produce flawless 4K images with perfect text and SynthID watermarking

Grok 4.1 Fast context

2 million token context window with free API access for the first two weeks

🔥 Breaking During The Show

Gemini 3 Pro — 45.14% ARC-AGI-2 Deep Think

Dropped live during the show. Thor Schaeff from Google DeepMind broke the news. Biggest single jump on ARC-AGI-2 ever recorded.

Grok 4.1 Fast + Agent Tools API

xAI launched Grok 4.1 Fast with 2M context and a new Agent Tools API — free for 2 weeks — right as the show was recording.

🎙️ Live from AI Engineer: The Craziest Week in AI

Alex kicks off the show live from the AI Engineer Summit in New York, joined by co-host Ryan Carson and surprise guest Swyx. The panel does a lightning-round 'pick one release from the week' — Ryan goes Gemini 3, Swyx agrees it's underrated, and Alex cheats by picking Antigravity (which includes Gemini 3). The TLDR is staggering: every major AI lab shipped something massive in the same five-day window.

Recorded live at AI Engineer Summit in New York with a professional podcast studio on the expo floor
Ryan Carson (Amp): first time they've ever switched their default model — Gemini 3 Pro is now default at Amp
Swyx calls Gemini 3 'still underrated despite all the attention it's already got'

Ryan Carson

"Gemini three Pro is a beast of a model and I think more excitingly it's not from Anthropic. Like we need more diversity in this space."

Swyx

"I think it is underappreciated and I'm not just saying this 'cause they're presenting sponsor. It is really good model. Very underrated. I mean, still underrated despite all the attention it's already got."

🏛️ AI Engineer Summit: Coding Agents Take Center Stage

Swyx walks Alex and Ryan through the summit's theme — coding agents — and explains why every major lab converging on agentic workflows makes it the right bet for 2025. From Cursor to Jules to CodeRabbit to Anthropic and Google Labs, the agent lab ecosystem is maturing fast. This year's summit also targets enterprise for the first time, with Fortune 500 attendees from Capital One, Bloomberg, and Atlassian.

23 applicants for every speaker slot — Swyx curated an all-star lineup from every lab
First summit focused on enterprise digital transformation alongside the developer community
Swyx: 'If you take vertical AI seriously enough, you eventually end up building an agent lab'

Swyx

"If you're really honest with yourself, probably code is the most important and serves our audience the best. Coding agents are just agents. Agents need to be able to code even if they're not specifically for coding."

💎 Gemini 3 Pro: Google's AI Comeback is Complete

Thor Schaeff (Google DeepMind, day three on the job!) joins the panel to celebrate Gemini 3 Pro's launch. The numbers are genuinely wild: 45.14% on ARC-AGI-2 with Deep Think mode, 81% on MMLU-Pro, and major gains in coding. Ryan confirms Amp switched to it as their default model the day it launched — the first time they've ever switched defaults. Deep Think mode explained, plus Gemini landing across Gmail, Calendar, and AI Mode in Search.

ARC-AGI-2: 31.11% standard, 45.14% with Deep Think — biggest ever jump on this benchmark
Ryan Carson: Amp switched to Gemini 3 Pro as default on launch day — never done that before
AI Mode rolling out in Google Search powered by Gemini 3 Pro

Ryan Carson

"We switched to it as our main model. It's our default. I mean, and we'd never do that. Like, we've never switched to a model. Not even when GPT-5 came out."

Thor Schaeff

"I'm on day three, day three. So something going on. Yeah. Tuesday, Gemini three. Pretty incredible. The reception of the community has been amazing."

🚀 Antigravity: Google's Free Agentic IDE That Feels Like the Future

Alex's personal pick of the week, Antigravity is a free VS Code fork reimagined for agent-first coding. The killer feature: an Agent Manager that acts like an inbox for your coding agents — run multiple agents in parallel, each working on different parts of your codebase simultaneously. Browser integration lets agents take screenshots and videos of your running app, then debug and iterate. Gemini 3 Pro handles the heavy coding; Nano Banana handles images.

Agent Manager: inbox-style interface to coordinate multiple parallel coding agents
Browser integration: agents can control Chrome, take screenshots, and self-debug
Free tier powered by Gemini 3 Pro — only model alongside GPT-OS 120B open source

Thor Schaeff

"It's clearly the future. You're an engineering manager now."

Alex Volkov

"The reason I'm picking them is because I think they're showing a new paradigm of how I work with agents when I code and I'm almost not looking at the code. The browser integration there is kind of crazy."

🤖 GPT-5.1-Codex-Max: 24-Hour Agent Runs and Native Compaction

Dominik Kundel from OpenAI joins live to break down GPT-5.1-Codex-Max, the newest frontier coding model designed for long-horizon software tasks. The headline: native compaction training lets it run for 24+ hours on a single task (an internal run reportedly went a full week). Dominik explains how compaction differs from just starting a new thread, efficiency gains (30% fewer thinking tokens), Windows/PowerShell improvements, and the new extra-high reasoning level.

Native compaction: model trained to intelligently summarize prior context and run indefinitely
30% fewer thinking tokens at median compared to predecessors — faster and smarter
58% on TerminalBench 2 — new SOTA; also leads SWE-Bench and SWE-Lancer vs. predecessors
Windows PowerShell support significantly improved; experimental Windows sandbox launched

Dominik Kundel

"We wanted to make sure that the model is really good at dealing with that compaction and can work on these long running tasks. Our goal with Codex is that we want it to be a software engineer that works on your team that you can trust with hard tasks."

Yam Peleg

"From the first prompt, you feel a difference. It's better. It just understands. It's hard to explain what exactly is better, but you feel it immediately."

⚡ Grok 4.1 Fast & Agent Tools API: xAI's Developer Moment

xAI had a huge week: Grok 4.1 briefly topped LM Arena (1483 Elo), then Grok 4.1 Fast landed with a 2M token context, native X search, Reddit search, web browsing, and code execution. The Agent Tools API benchmarks are jaw-dropping: 93-100% on τ²-Bench, 72% on Berkeley Function Calling v4 — at $0.20/$0.50 per million tokens. Yam confirms the X and Reddit search is real and working. Alex shares his experience using both models in his N8N research agent.

Grok 4.1 topped LM Arena at 1483 Elo before Gemini 3 eclipsed it
Grok 4.1 Fast: $0.20 input / $0.50 output per million tokens — free for 2 weeks on xAI API and OpenRouter
Agent Tools: native X + Reddit search that other models refuse to do
72% on Berkeley Function Calling v4 — top of the leaderboard, 10× cheaper than Gemini 3 Pro

Yam Peleg

"The X search is great. It can also search Reddit, by the way, which many other models refuse to do. Grok just does it on its own."

LDJ

"T²-Bench has an airline section and a retail section — multi-hop reasoning agentic benchmark for things like booking airline tickets or customer service in a simulated environment."

🍌 Nano Banana Pro: 4K Image Generation with Perfect Text

Breaking news mid-show: Google releases Nano Banana Pro, upgraded with thinking traces, 4K resolution, and SynthID watermarking. Alex demos it live by generating an 8MB infographic about the week's AI news — the text is perfect across the entire image, logos are pixel-accurate, and the composition is impressive. Wolfram demos generative UIs in Gemini — Gemini building an interactive news dashboard with real-time market data on demand.

Breaking news during the live show — Alex demos it instantly with an AI news infographic
Perfect text rendering across 4K images — no garbled letters, accurate logos
Thinking traces visible before generation — Gemini 3 plans, Nano Banana executes
SynthID watermarking and C2PA metadata for provenance on every image
Generative UIs: Gemini builds interactive dashboards with real data on the fly

Alex Volkov

"This is a one shot prompt infographic that I just took notes from everything that I had for the show this week. It's eight megabytes of a file that it generated, and the text is perfect across all of it. It does not look like AI."

Swyx

"At this scale, you expect some typos here and there. I don't see any."

🔬 Meta SAM 3 & SAM 3D, OLMo 3, and Open Source News

Meta joins the party with SAM 3 — open-vocabulary video segmentation with text and exemplar prompts — and SAM 3D for turning single photos into 3D objects and human body reconstructions. The panel demos it live on dog videos. LDJ and Nisten highlight OLMo 3 from Allen AI as a fully open 32B model (full dataset, training recipe, hyperparameters) — the contrast to open-weights-only releases from Qwen and DeepSeek is stark.

SAM 3: click or text-prompt to segment and track any object across video — live demo with golden retrievers
SAM 3D: single image to 3D object or full human body reconstruction
OLMo 3: Allen AI's fully open 32B dense model — dataset, recipe, and hyperparameters all public
Marimo Python notebooks: new VS Code and Cursor extension with reactive notebooks and UV integration

LDJ

"OLMo is completely open. I don't think Qwen or DeepSeek, although they are doing great work, have ever actually put out a fully open recipe — dataset, full training recipe, hyperparameters, everything 100% open."

Ryan Carson

"Think about what humans are gonna do with this model. Like, there's so many cool things you can do."

If you only skim one section, make it this one:

Google

Gemini 3 Pro: 1M-token multimodal model, huge reasoning gains — new LLM king; ARC-AGI-2: 31.11% (Pro), 45.14% (Deep Think) — enormous jumps
Antigravity IDE: free, Gemini-powered VS Code fork with agents, plans, walkthroughs, and browser control
Nano Banana Pro: 4K image generation with perfect text + SynthID provenance; dynamic generative UIs in Gemini

xAI

Grok 4.1: big post-training upgrade — #1 on human-preference leaderboards, much better EQ & creative writing, fewer hallucinations
Grok 4.1 Fast + Agent Tools API: 2M context, SOTA tool-calling & agent benchmarks (Berkeley FC, T2-Bench, research evals), aggressive pricing and tight X + web integration

OpenAI

GPT-5.1-Codex-Max: frontier agentic coding model built for 24h+ software tasks with native compaction for million-token sessions; big gains on SWE-Bench, SWE-Lancer, TerminalBench 2
GPT-5.1 Pro: new research-grade ChatGPT mode that will happily think for minutes on a single query

Robotics

Sunday Robotics — ACT-1 & Memo: home robot foundation model trained from a $200 skill glove instead of $20K teleop rigs; long-horizon household tasks with solid zero-shot generalization

Recorded live at the AI Engineer Summit in New York. Three incredible guests: Swyx (Cognition/Latent Space), Thor Schaeff (Google DeepMind, day 3!), and Dominik Kundel (OpenAI).

Alex Volkov 0:38

Hello.

0:39

Hello everyone. Welcome to ThursdAI for November 20. What a week. Welcome to ThursdAI, November 20th. My name is Alex Volkov. I'm an AI evangelist with Weights, & Biases from CoreWeave. I'm the host of Thursday. I, and I'm here on location at AI engineer in New York. And I'm not the only one. I have a friend here, Ryan Carson from amp And we're on location of engineer in New York. This is the second one this year. hopefully everything is set up and you guys can hear us well. We're gonna have, a few guests joining us, on this honestly insane week.

Ryan Carson 1:10

crazy.

Alex Volkov 1:11

Just an absolutely, absolutely insane week.

1:13

we're also joined by some other folks here, so we're gonna have Wolf and we're gonna have LDJ. We're gonna add them to the stage in a bit. but just before, I would like to call out that this podcast studio is hosted by the great AI engineer folks. So shout out to Swix and Ben from AI engineer, and we will tell you about the conference because it is live streamed. We're gonna make sure that all of this, very busy week that we have prepared for you is gonna be covered fully, but also we'll tell you all about this conference and what's going on and the incredible amount of speakers here from essentially every Agent lab and Foundation lab in existence. Everybody's here, And we're gonna basically. experience a background noise. Maybe that's a good call out. because we're here on location, we don't control the crowd. they may come, there may be some noises, but we'll definitely, be very happy to share with you everything that's going on, including the recent release of Yes. Nano Banana Pro. Yes. So, we're gonna chat about all that. , So Ryan, I think, I don't think the folks could hear you well, so let me put you on the spot. and give us your thoughts on, on the one AI release from this week. Here.

Ryan Carson 2:15

so absolutely has to be Gemini three.

2:17

I mean, Gemini three Pro is a beast of a model and I think more excitingly it's not from Anthropic. Yeah. You know, not from OpenAI. Like we need more diversity in this space. So Gemini three Pro, really good at coding, it's our default model. And amp now I'm, I'm very much loving it. we did hit rate limits though on launch day and even the day after, which is frustrating. They've been solved. Let's go.

Alex Volkov 2:40

Alright, and speaking of, let's go, let's

2:42

flag our friend over there. Swyx. Swyx. You wanna join? Alright folks, we're gonna bring on, the founder of AI Engineer, the person who coined the term AI engineer. Come on in, Mr. Swyx. Welcome. What's up? we're gonna give you this so you can hear yourself as well. Okay. and then you need to speak into the microphone as much as you can. Okay. And say hi to folks. There you are. I think we can, we can swing this. but also Swyx welcome. We're doing a round of, understanding of like what is the, the one release this week in the AI world. Oh. One release. You only get to pick one. Folks don't only get picked one. Yeah.

Swyx 3:20

Who is, has someone who picked Gemini already?

3:22

Everybody picked. Yeah, everybody did.

Alex Volkov 3:24

it is Gemini.

Swyx 3:25

I think it is underappreciated and I'm not just saying this 'cause

3:27

they're presenting sponsor this time. Yeah. Uh, it's a good model. It is really good model. Very underrated. It's, I mean, still underrated despite all the, uh. Is already got. And I think we will still be exploring the implications of it for a long time. Yeah,

Alex Volkov 3:43

a hundred percent.

3:43

alright, I will go because I haven't, I haven't gone yet and I think that mine is, I don't know if it's Gemini. come on. Just pick Gemini three. It's fine. No, no, no, no. Just because you all picked Gemini three. I'm gonna go with, Hmm. I'm gonna go with with anti-gravity. Okay. I'll tell you Hot take. Take a hot take. I've been using antigravity, which is a new idea from Google. We're gonna tell you about this in, in in a bit. And, and they're speaking tOlmorrow. And they're speaking tOlmorrow here at the engineer. Okay, cool. And the reason I'm picking them is because I think they're showing a new paradigm of how I work with agents when I code and I'm almost not looking at the code. the browser integration there is kind of crazy. And I think they're showing a few things that every other agent, lab will follow. And, because Gemini three is kind of the int intelligence that powers it. This mine includes Gemini three, so I'm cheating a little bit. but I think like, Antigravity was like a very big release that, the folks are saying like, antigravity is fire. alright, so I think we're, we did the round, we'll do A-T-L-D-R and then, after the TLDR switch, I'm gonna ask you a little bit about the engineer. Yeah. And then we're gonna talk about the. Actual releases, like in depth. So, I think it's time for the TLDR super quick. And then, and then we're gonna have our friend Swyx here cover a little bit of the, the folks who come here and the type of stuff we're gonna cover. So much, so much. I

Swyx 4:59

saw the logo wall, with like all the companies represented.

5:02

I was like, wow, they, it's everybody. That's a lot of books and we're

Alex Volkov 5:05

Alright folks, it's time for A-T-L-D-R, which is a section we do to

5:08

try to super quick keep you up to date on everything that happened this week. I'm gonna run through a bunch of releases, After that we have Thor from Gemini, and then hopefully we'll have, Dominic from Open the Eye as well. so, stay tuned for that. Let's go to TLD folks and if there's breaking news, I don't know if I can contain, but supposedly this,

Swyx 5:26

nano banana.

Alex Volkov 5:27

Yes, nano banana is already on there.

5:29

but it's breaking news for sure and supposedly on Tropic something, but we'll see.