Episode Summary

Recorded live from the AI Engineer Summit in New York, this might be the most packed ThursdAI episode ever โ€” in a single week, Google dropped Gemini 3 Pro (45% on ARC-AGI-2!), xAI shipped Grok 4.1 then Grok 4.1 Fast with a full Agent Tools API, OpenAI answered with GPT-5.1-Codex-Max capable of 24-hour+ coding runs, and Meta segmented the universe with SAM 3 and SAM 3D. Oh, and Google capped Thursday itself with Nano Banana Pro generating flawless 4K infographics while Alex was still live on air. Three incredible guests joined โ€” Swyx from Cognition/Latent Space who organized the summit, Thor Schaeff from Google DeepMind (on day three of his new job!), and Dominik Kundel from OpenAI breaking down Codex's native compaction magic. The future didn't just arrive โ€” it showed up with luggage.

Hosts & Guests

Alex Volkov
Alex Volkov
Host ยท W&B / CoreWeave
@altryne
Swyx
Swyx
Latent Space / Cognition โ€” Founder & AI Engineer Conference
@swyx
Thor Schaeff
Thor Schaeff
Google DeepMind โ€” Developer Experience Engineer
@thorwebdev
Dominik Kundel
Dominik Kundel
OpenAI โ€” Developer Experience & SDKs
@dkundel
Ryan Carson
Ryan Carson
AI educator & founder
@ryancarson
Yam Peleg
Yam Peleg
AI builder & founder
@Yampeleg
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator
@WolframRvnwlf
LDJ
LDJ
Nous Research
@ldjconfirmed
Nisten Tahiraj
Nisten Tahiraj
AI operator & builder
@nisten

By The Numbers

ARC-AGI-2
45.14%
Gemini 3 Deep Think โ€” enormous jump, roughly double previous SOTA on this reasoning benchmark
ฯ„ยฒ-Bench Telecom
93โ€“100%
Grok 4.1 Fast on agentic customer-service simulation at ~10ร— cheaper than competitors
TerminalBench 2
58%
GPT-5.1-Codex-Max โ€” new SOTA on the benchmark just days after its launch
Agent run time
24h+
GPT-5.1-Codex-Max native compaction lets it work on a single task for a day โ€” or more
Nano Banana Pro
4K
First image model to produce flawless 4K images with perfect text and SynthID watermarking
Grok 4.1 Fast context
2M
2 million token context window with free API access for the first two weeks

๐Ÿ”ฅ Breaking During The Show

Gemini 3 Pro โ€” 45.14% ARC-AGI-2 Deep Think
Dropped live during the show. Thor Schaeff from Google DeepMind broke the news. Biggest single jump on ARC-AGI-2 ever recorded.
Grok 4.1 Fast + Agent Tools API
xAI launched Grok 4.1 Fast with 2M context and a new Agent Tools API โ€” free for 2 weeks โ€” right as the show was recording.

๐ŸŽ™๏ธ Live from AI Engineer: The Craziest Week in AI

Alex kicks off the show live from the AI Engineer Summit in New York, joined by co-host Ryan Carson and surprise guest Swyx. The panel does a lightning-round 'pick one release from the week' โ€” Ryan goes Gemini 3, Swyx agrees it's underrated, and Alex cheats by picking Antigravity (which includes Gemini 3). The TLDR is staggering: every major AI lab shipped something massive in the same five-day window.

  • Recorded live at AI Engineer Summit in New York with a professional podcast studio on the expo floor
  • Ryan Carson (Amp): first time they've ever switched their default model โ€” Gemini 3 Pro is now default at Amp
  • Swyx calls Gemini 3 'still underrated despite all the attention it's already got'
Ryan Carson
Ryan Carson
"Gemini three Pro is a beast of a model and I think more excitingly it's not from Anthropic. Like we need more diversity in this space."
Swyx
Swyx
"I think it is underappreciated and I'm not just saying this 'cause they're presenting sponsor. It is really good model. Very underrated. I mean, still underrated despite all the attention it's already got."

๐Ÿ›๏ธ AI Engineer Summit: Coding Agents Take Center Stage

Swyx walks Alex and Ryan through the summit's theme โ€” coding agents โ€” and explains why every major lab converging on agentic workflows makes it the right bet for 2025. From Cursor to Jules to CodeRabbit to Anthropic and Google Labs, the agent lab ecosystem is maturing fast. This year's summit also targets enterprise for the first time, with Fortune 500 attendees from Capital One, Bloomberg, and Atlassian.

  • 23 applicants for every speaker slot โ€” Swyx curated an all-star lineup from every lab
  • First summit focused on enterprise digital transformation alongside the developer community
  • Swyx: 'If you take vertical AI seriously enough, you eventually end up building an agent lab'
Swyx
Swyx
"If you're really honest with yourself, probably code is the most important and serves our audience the best. Coding agents are just agents. Agents need to be able to code even if they're not specifically for coding."

๐Ÿ’Ž Gemini 3 Pro: Google's AI Comeback is Complete

Thor Schaeff (Google DeepMind, day three on the job!) joins the panel to celebrate Gemini 3 Pro's launch. The numbers are genuinely wild: 45.14% on ARC-AGI-2 with Deep Think mode, 81% on MMLU-Pro, and major gains in coding. Ryan confirms Amp switched to it as their default model the day it launched โ€” the first time they've ever switched defaults. Deep Think mode explained, plus Gemini landing across Gmail, Calendar, and AI Mode in Search.

  • ARC-AGI-2: 31.11% standard, 45.14% with Deep Think โ€” biggest ever jump on this benchmark
  • Ryan Carson: Amp switched to Gemini 3 Pro as default on launch day โ€” never done that before
  • AI Mode rolling out in Google Search powered by Gemini 3 Pro
Ryan Carson
Ryan Carson
"We switched to it as our main model. It's our default. I mean, and we'd never do that. Like, we've never switched to a model. Not even when GPT-5 came out."
Thor Schaeff
Thor Schaeff
"I'm on day three, day three. So something going on. Yeah. Tuesday, Gemini three. Pretty incredible. The reception of the community has been amazing."

๐Ÿš€ Antigravity: Google's Free Agentic IDE That Feels Like the Future

Alex's personal pick of the week, Antigravity is a free VS Code fork reimagined for agent-first coding. The killer feature: an Agent Manager that acts like an inbox for your coding agents โ€” run multiple agents in parallel, each working on different parts of your codebase simultaneously. Browser integration lets agents take screenshots and videos of your running app, then debug and iterate. Gemini 3 Pro handles the heavy coding; Nano Banana handles images.

  • Agent Manager: inbox-style interface to coordinate multiple parallel coding agents
  • Browser integration: agents can control Chrome, take screenshots, and self-debug
  • Free tier powered by Gemini 3 Pro โ€” only model alongside GPT-OS 120B open source
Thor Schaeff
Thor Schaeff
"It's clearly the future. You're an engineering manager now."
Alex Volkov
Alex Volkov
"The reason I'm picking them is because I think they're showing a new paradigm of how I work with agents when I code and I'm almost not looking at the code. The browser integration there is kind of crazy."

๐Ÿค– GPT-5.1-Codex-Max: 24-Hour Agent Runs and Native Compaction

Dominik Kundel from OpenAI joins live to break down GPT-5.1-Codex-Max, the newest frontier coding model designed for long-horizon software tasks. The headline: native compaction training lets it run for 24+ hours on a single task (an internal run reportedly went a full week). Dominik explains how compaction differs from just starting a new thread, efficiency gains (30% fewer thinking tokens), Windows/PowerShell improvements, and the new extra-high reasoning level.

  • Native compaction: model trained to intelligently summarize prior context and run indefinitely
  • 30% fewer thinking tokens at median compared to predecessors โ€” faster and smarter
  • 58% on TerminalBench 2 โ€” new SOTA; also leads SWE-Bench and SWE-Lancer vs. predecessors
  • Windows PowerShell support significantly improved; experimental Windows sandbox launched
Dominik Kundel
Dominik Kundel
"We wanted to make sure that the model is really good at dealing with that compaction and can work on these long running tasks. Our goal with Codex is that we want it to be a software engineer that works on your team that you can trust with hard tasks."
Yam Peleg
Yam Peleg
"From the first prompt, you feel a difference. It's better. It just understands. It's hard to explain what exactly is better, but you feel it immediately."

โšก Grok 4.1 Fast & Agent Tools API: xAI's Developer Moment

xAI had a huge week: Grok 4.1 briefly topped LM Arena (1483 Elo), then Grok 4.1 Fast landed with a 2M token context, native X search, Reddit search, web browsing, and code execution. The Agent Tools API benchmarks are jaw-dropping: 93-100% on ฯ„ยฒ-Bench, 72% on Berkeley Function Calling v4 โ€” at $0.20/$0.50 per million tokens. Yam confirms the X and Reddit search is real and working. Alex shares his experience using both models in his N8N research agent.

  • Grok 4.1 topped LM Arena at 1483 Elo before Gemini 3 eclipsed it
  • Grok 4.1 Fast: $0.20 input / $0.50 output per million tokens โ€” free for 2 weeks on xAI API and OpenRouter
  • Agent Tools: native X + Reddit search that other models refuse to do
  • 72% on Berkeley Function Calling v4 โ€” top of the leaderboard, 10ร— cheaper than Gemini 3 Pro
Yam Peleg
Yam Peleg
"The X search is great. It can also search Reddit, by the way, which many other models refuse to do. Grok just does it on its own."
LDJ
LDJ
"Tยฒ-Bench has an airline section and a retail section โ€” multi-hop reasoning agentic benchmark for things like booking airline tickets or customer service in a simulated environment."

๐ŸŒ Nano Banana Pro: 4K Image Generation with Perfect Text

Breaking news mid-show: Google releases Nano Banana Pro, upgraded with thinking traces, 4K resolution, and SynthID watermarking. Alex demos it live by generating an 8MB infographic about the week's AI news โ€” the text is perfect across the entire image, logos are pixel-accurate, and the composition is impressive. Wolfram demos generative UIs in Gemini โ€” Gemini building an interactive news dashboard with real-time market data on demand.

  • Breaking news during the live show โ€” Alex demos it instantly with an AI news infographic
  • Perfect text rendering across 4K images โ€” no garbled letters, accurate logos
  • Thinking traces visible before generation โ€” Gemini 3 plans, Nano Banana executes
  • SynthID watermarking and C2PA metadata for provenance on every image
  • Generative UIs: Gemini builds interactive dashboards with real data on the fly
Alex Volkov
Alex Volkov
"This is a one shot prompt infographic that I just took notes from everything that I had for the show this week. It's eight megabytes of a file that it generated, and the text is perfect across all of it. It does not look like AI."
Swyx
Swyx
"At this scale, you expect some typos here and there. I don't see any."

๐Ÿ”ฌ Meta SAM 3 & SAM 3D, OLMo 3, and Open Source News

Meta joins the party with SAM 3 โ€” open-vocabulary video segmentation with text and exemplar prompts โ€” and SAM 3D for turning single photos into 3D objects and human body reconstructions. The panel demos it live on dog videos. LDJ and Nisten highlight OLMo 3 from Allen AI as a fully open 32B model (full dataset, training recipe, hyperparameters) โ€” the contrast to open-weights-only releases from Qwen and DeepSeek is stark.

  • SAM 3: click or text-prompt to segment and track any object across video โ€” live demo with golden retrievers
  • SAM 3D: single image to 3D object or full human body reconstruction
  • OLMo 3: Allen AI's fully open 32B dense model โ€” dataset, recipe, and hyperparameters all public
  • Marimo Python notebooks: new VS Code and Cursor extension with reactive notebooks and UV integration
LDJ
LDJ
"OLMo is completely open. I don't think Qwen or DeepSeek, although they are doing great work, have ever actually put out a fully open recipe โ€” dataset, full training recipe, hyperparameters, everything 100% open."
Ryan Carson
Ryan Carson
"Think about what humans are gonna do with this model. Like, there's so many cool things you can do."

If you only skim one section, make it this one:

Google

  • Gemini 3 Pro: 1M-token multimodal model, huge reasoning gains — new LLM king; ARC-AGI-2: 31.11% (Pro), 45.14% (Deep Think) — enormous jumps
  • Antigravity IDE: free, Gemini-powered VS Code fork with agents, plans, walkthroughs, and browser control
  • Nano Banana Pro: 4K image generation with perfect text + SynthID provenance; dynamic generative UIs in Gemini

xAI

  • Grok 4.1: big post-training upgrade — #1 on human-preference leaderboards, much better EQ & creative writing, fewer hallucinations
  • Grok 4.1 Fast + Agent Tools API: 2M context, SOTA tool-calling & agent benchmarks (Berkeley FC, T2-Bench, research evals), aggressive pricing and tight X + web integration

OpenAI

  • GPT-5.1-Codex-Max: frontier agentic coding model built for 24h+ software tasks with native compaction for million-token sessions; big gains on SWE-Bench, SWE-Lancer, TerminalBench 2
  • GPT-5.1 Pro: new research-grade ChatGPT mode that will happily think for minutes on a single query

Meta

  • SAM 3: open-vocabulary segmentation + tracking across images and video (with text & exemplar prompts)
  • SAM 3D: single-image to 3D objects & human bodies; surprisingly high-quality 3D from one photo

Robotics

  • Sunday Robotics — ACT-1 & Memo: home robot foundation model trained from a $200 skill glove instead of $20K teleop rigs; long-horizon household tasks with solid zero-shot generalization

Recorded live at the AI Engineer Summit in New York. Three incredible guests: Swyx (Cognition/Latent Space), Thor Schaeff (Google DeepMind, day 3!), and Dominik Kundel (OpenAI).

Alex Volkov
Alex Volkov 0:38
Hello.
0:39
Hello everyone. Welcome to ThursdAI for November 20. What a week. Welcome to ThursdAI, November 20th. My name is Alex Volkov. I'm an AI evangelist with Weights, & Biases from CoreWeave. I'm the host of Thursday. I, and I'm here on location at AI engineer in New York. And I'm not the only one. I have a friend here, Ryan Carson from amp And we're on location of engineer in New York. This is the second one this year. hopefully everything is set up and you guys can hear us well. We're gonna have, a few guests joining us, on this honestly insane week.
Ryan Carson
Ryan Carson 1:10
crazy.
Alex Volkov
Alex Volkov 1:11
Just an absolutely, absolutely insane week.
1:13
we're also joined by some other folks here, so we're gonna have Wolf and we're gonna have LDJ. We're gonna add them to the stage in a bit. but just before, I would like to call out that this podcast studio is hosted by the great AI engineer folks. So shout out to Swix and Ben from AI engineer, and we will tell you about the conference because it is live streamed. We're gonna make sure that all of this, very busy week that we have prepared for you is gonna be covered fully, but also we'll tell you all about this conference and what's going on and the incredible amount of speakers here from essentially every Agent lab and Foundation lab in existence. Everybody's here, And we're gonna basically. experience a background noise. Maybe that's a good call out. because we're here on location, we don't control the crowd. they may come, there may be some noises, but we'll definitely, be very happy to share with you everything that's going on, including the recent release of Yes. Nano Banana Pro. Yes. So, we're gonna chat about all that. , So Ryan, I think, I don't think the folks could hear you well, so let me put you on the spot. and give us your thoughts on, on the one AI release from this week. Here.
Ryan Carson
Ryan Carson 2:15
so absolutely has to be Gemini three.
2:17
I mean, Gemini three Pro is a beast of a model and I think more excitingly it's not from Anthropic. Yeah. You know, not from OpenAI. Like we need more diversity in this space. So Gemini three Pro, really good at coding, it's our default model. And amp now I'm, I'm very much loving it. we did hit rate limits though on launch day and even the day after, which is frustrating. They've been solved. Let's go.
Alex Volkov
Alex Volkov 2:40
Alright, and speaking of, let's go, let's
2:42
flag our friend over there. Swyx. Swyx. You wanna join? Alright folks, we're gonna bring on, the founder of AI Engineer, the person who coined the term AI engineer. Come on in, Mr. Swyx. Welcome. What's up? we're gonna give you this so you can hear yourself as well. Okay. and then you need to speak into the microphone as much as you can. Okay. And say hi to folks. There you are. I think we can, we can swing this. but also Swyx welcome. We're doing a round of, understanding of like what is the, the one release this week in the AI world. Oh. One release. You only get to pick one. Folks don't only get picked one. Yeah.
Swyx
Swyx 3:20
Who is, has someone who picked Gemini already?
3:22
Everybody picked. Yeah, everybody did.
Alex Volkov
Alex Volkov 3:24
it is Gemini.
Swyx
Swyx 3:25
I think it is underappreciated and I'm not just saying this 'cause
3:27
they're presenting sponsor this time. Yeah. Uh, it's a good model. It is really good model. Very underrated. It's, I mean, still underrated despite all the, uh. Is already got. And I think we will still be exploring the implications of it for a long time. Yeah,
Alex Volkov
Alex Volkov 3:43
a hundred percent.
3:43
alright, I will go because I haven't, I haven't gone yet and I think that mine is, I don't know if it's Gemini. come on. Just pick Gemini three. It's fine. No, no, no, no. Just because you all picked Gemini three. I'm gonna go with, Hmm. I'm gonna go with with anti-gravity. Okay. I'll tell you Hot take. Take a hot take. I've been using antigravity, which is a new idea from Google. We're gonna tell you about this in, in in a bit. And, and they're speaking tOlmorrow. And they're speaking tOlmorrow here at the engineer. Okay, cool. And the reason I'm picking them is because I think they're showing a new paradigm of how I work with agents when I code and I'm almost not looking at the code. the browser integration there is kind of crazy. And I think they're showing a few things that every other agent, lab will follow. And, because Gemini three is kind of the int intelligence that powers it. This mine includes Gemini three, so I'm cheating a little bit. but I think like, Antigravity was like a very big release that, the folks are saying like, antigravity is fire. alright, so I think we're, we did the round, we'll do A-T-L-D-R and then, after the TLDR switch, I'm gonna ask you a little bit about the engineer. Yeah. And then we're gonna talk about the. Actual releases, like in depth. So, I think it's time for the TLDR super quick. And then, and then we're gonna have our friend Swyx here cover a little bit of the, the folks who come here and the type of stuff we're gonna cover. So much, so much. I
Swyx
Swyx 4:59
saw the logo wall, with like all the companies represented.
5:02
I was like, wow, they, it's everybody. That's a lot of books and we're
Alex Volkov
Alex Volkov 5:05
Alright folks, it's time for A-T-L-D-R, which is a section we do to
5:08
try to super quick keep you up to date on everything that happened this week. I'm gonna run through a bunch of releases, After that we have Thor from Gemini, and then hopefully we'll have, Dominic from Open the Eye as well. so, stay tuned for that. Let's go to TLD folks and if there's breaking news, I don't know if I can contain, but supposedly this,
Swyx
Swyx 5:26
nano banana.
Alex Volkov
Alex Volkov 5:27
Yes, nano banana is already on there.
5:29
but it's breaking news for sure and supposedly on Tropic something, but we'll see.
5:43
All righty folks, this is the TLDR. This is everything that we have to talk about. On ThursdAI four, November 20th, live from the AI Engineer Conference in New York. This has been an insane, insane week. Lemme try to share my TLDR notes here with you because honestly there's a lot. so we're gonna run through them super quick. I'll just tell you that, in one week Google dropped Gemini three pro and an agent ID and generative UIs and agent mode, before this XAI shipped Grok 4.1 and then they shipped Grok 4.1 fast. With supporting API, open the, I decided to not leave this party unattended and GPT 5.1 Codex Max, which supposedly uses compacting to run for 24 hours based on their release. But also here in the engineer, I chatted with an OpenAI guy who told me it can run for almost a week, which is crazy. And then meta decided, you know what, if everybody's joining this party, why don't we as well? And they, they release Sam segment anything, model three, which has to support for 3D models as well. Wow. in robotics, there was another, another robot called Sunday that also launched and supposedly autonOlmously non-training and user data. And then, this week feels like the future is deciding to like, come and knock on our door. so. We're gonna run. And this is why ThursdAI exist, we stay up to date. So you don't have to, we literally work really, really hard to stay up to date, so you don't have to, Alex gets no sleep, basically. I, I barely slept this week as well. Yes, for sure. So, let's run through all of these releases. I think I, I think I covered all of them with the TDR, but let me run through them, super quick to see. and then, also we have folks, we have, Wolfram and some other folks as well. no, this is not the good layout. We'll do this loud, super quick. All right. So, your host, Alex Volkov from, AI Ventures. With advisor, we have Ryan Carson from amp. Hello, hello Swyx here from AI Engineer Swyx from cognition, from Swyx, from everywhere. Swyx from Latent space, Swyx from small AI news, and also co-host of Thursday ai, not for the first.
Swyx
Swyx 7:41
shared, but AI News actually has been merged into Lean Space.
7:47
Oh, that's great. Yeah, it's, it's the IPOs transferred. I am working on the next version of AI news. but yeah, it exciting and we need to simplify the introductions.
Alex Volkov
Alex Volkov 7:56
Yeah.
7:56
Okay. So switch from latent space space, and, and cognition and co-host. We have Wolf and Rev Wolf, we have Yam Peleg, and we have, NI and LDJ. we are running through the TDR, so the open source news super quick since now. Sets a new bar for multimodal spatial intelligence with 8 million sample taxon driven data sets. My bad, it's really pretty cool, but honestly, open source is taking the backseat this week for sure. So we're, we're gonna like mention some of the stuff. we also have Sunday Robotics Unveils Act One. Act one is a robot foundation model train with zero direct robot data and the low cost skill capture glove. The glove is super cool. It's really, really something. let me pull up camera three. Yes. we then we're gonna go to big companies, LS and APIs. I. Already told you everything that was released in one breath. But I'm gonna go a little bit deeper. Google finally unveils Gemini three Pro. It's a 1 million token multimodal agentic, top intelligence that we have, with generative UIs that it can do in Google search, in the Gemini app and the anti-gravity coding id, which is free, a free coding id. and just before Elon decided to ship GRS 4.1, which is a significant update. And you can see here it has a 64% win rate on top of the previous gr and there's a bunch of evals as well that we can talk to you about. EQ benches top of, almost top on Nick Bench. it was top on El Marina for one week. and then Gemini released, Gemini three and then they released GR 4.1 fast in the API including access
Ryan Carson
Ryan Carson 9:24
to Twitter tools.
9:25
So this is what I wanna ask. Do we think that this is still, is this the beginning of people actually using grok for production in business?
Swyx
Swyx 9:32
I've been using it for AI news for a while.
9:34
Because a lot of news happens on. X.
Alex Volkov
Alex Volkov 9:36
have you used the GR api?
9:38
We'll get there, so GR 4.1 Fast and Agent Tools, API, launched on GR and they're really, really cool. And we already used them for, preparing for the show and then OpenAI decided, hey, this party is a little bit, being like, it's. Needs to be more exciting. Too many people are choosing Gemini. they're not on a sponsor list. So they have to do something. So they launch and always we know Sam Alman watches these releases. OpenAI launched two things. Gemini 5.1 Codex Max, with a long horizon Agenting coding for 1 million token contact window. but also the contact window may not be as relevant because it has a native, training for something called compacting, which we'll get into, here and 24 hour software tasks that can go almost for a week, which is crazy. And then also they route GPT 5.1 Pro, which many folks say that this is now the top agent model, even though Gemini also released a deep think.
Ryan Carson
Ryan Carson 10:28
Yep.
Alex Volkov
Alex Volkov 10:29
GGBT 5 1 1 Pro seems to be like the crazy one.
Ryan Carson
Ryan Carson 10:31
So many good models to choose from.
Alex Volkov
Alex Volkov 10:33
And then, the other folks, Microsoft, Nvidia and Anthropic, who
10:36
are not mentioned here before all announced a major partnership to scale cloud on Azure and then also investment. From both these companies into Anthropic. $15 billion, $15 billion. So not bad. Literally every major foundational lab had something to say this week. They all clicked, clicked in. And this is why Thursday exists.
Swyx
Swyx 10:56
Yes.
10:56
But we actually have meta cogen, like from Super MSL, speaking tOlmorrow. Yeah. Really on Meta Cold World models, the, the model that they just released, a couple months ago. Cool. Oh wow.
Alex Volkov
Alex Volkov 11:05
That's so cool.
11:06
Yeah. And so meta speaking of Meta meta decided to also say, Hey, while our LLM is baking behind the scenes somewhere, we, we definitely want to like, participate in this party. So Matter release Sam and Sam. Sam three. Yeah. And Sam 3D Sam is segment Anything model. It's a profitable segmentation tracking in single image 3D reconstruction. and it's insane. Like, I'm gonna show you demos. It's, it's mind boggling. Like it's, you upload the video, you click a thing and you natural language say, Hey, I wanna segment out all the water bottles on this table, on the video. And it just works. And it's mind blowing. So if you haven't noticed that this, and then this morning we're gonna hit the breaking news button because it's well deserved. this morning Google decided to release another thing to just say, Hey, we could have released it on, Tuesday, but we waited for everybody else to follow with their announcements, and now we're gonna announce another thing. So, breaking news from today. AI breaking news coming at you only on ThursdAI The breaking news from today, folks, is that, DeepMind decided to release. There's a robot coming in. There's a robot.
Swyx
Swyx 12:14
We can't see it on camera.
Alex Volkov
Alex Volkov 12:15
Can
Swyx
Swyx 12:15
we see the robot?
Alex Volkov
Alex Volkov 12:16
the breaking news from today is not this robot doc.
12:18
Breaking news from Gemini DeepMind. Today is Nano Banana Pro. Nano Banana Pro Nano Nana Pro. Nano Banana is an imaging model, image editing model from, Google defined
Swyx
Swyx 12:28
The poor robot is trying to get attention.
Alex Volkov
Alex Volkov 12:31
Alrighty, folks.
12:31
and we, if you join the engineer, you can see this like a poor robot like walking around trying to get attention. Hi Sparky. Hi, Sparky giving away, dog. Yes, we are giving away this dog, apparently. so, stay tuned. Yes, stay tuned. Please don't break the venue, dog. okay, so n this is a little hectic. Let's try to get it together. Nando Banana Pro is. In. Okay. Okay. I'm gonna have to show this on stage because, because why you do that. What's the fundamental difference between pro and standard? I think 4K is the number one thing. It like, it generates things in 4K. I asked it to generate a, infoGrokic. I don't know if it's gonna like show Well, because it's in 4K infoGrokic. and it's taken a while for the software to upload. I don't even know if I can zoom. but this is, but it looks great. This is a one shot. Prompt infoGrokic that I just took notes from everything that I had for the show this week, and I tried to generate this. it's eight megabytes of a file that it generated, and the text is perfect across all of it. It does not look like ai. I asked you to generate this file. You can see eight for eight megabytes. this one is a little better and you can see if I zoom in that the text is nearly perfect. Ryan, I wanna show you this as well. Oh wow. The text is just bonkers. Perfect. Ah, really good. Yeah. Usually you,
Swyx
Swyx 13:50
at this scale, you expect some typos here and there.
Alex Volkov
Alex Volkov 13:52
Yeah.
Swyx
Swyx 13:52
I don't see any.
Alex Volkov
Alex Volkov 13:54
This is, and look at the resolution.
13:55
This is crazy. Okay. That's, that is amazing. And this is, this is me by the way. Very, very handsome. See my face, of course. and it generated something that's not me. but it has grounding with Google That is good. And so it went and it found all the logos as well in another one that I did. I'll show you the other one. The other one pulled in all the logos of all the major labs. So this one, probably has 'em in distribution. Oh, did it search or it just has it, it has searched stored, it has search. And I actually dunno if it brought it and used it a context. I think it memorized the OpenAI logo. 'cause XI is a little wonky. So folks, this is, nano Banana Pro. Whoa, whoa. I think Thor's here. If you want to talk to him, maybe. Yeah. We're gonna get tore up in, five to seven minutes. We're gonna grab you. but before this, this is the TLDR. The last thing in the TLDR folks was, Google also launched. The anti-gravity IDE, which is an agentic windsurf fork of a vs. Code fork, editor that has a new and interesting approach to it. It has agent manager, which is an additional tool alongside the ID itself, where you can talk to your agents. It's kind of like an i, I want to call it like an inbox for agents. And, this is for folks who don't really care about, code anymore and many white coders. I wouldn't say that, but they don't really, you can, we'll get it and we'll show this, and hopefully we'll chat with th about inter gravity as well. but before this, I won't acknowledge, our friend wigs here. So first of all, thank you for the podcast studio. This is awesome. Yes, thank you. It's beautiful.
Swyx
Swyx 15:16
we have nice microphones.
15:17
Usually it costs a bit of setup, but you know, we have to level up. Right. You know, like, it's Nice. Last time you had to get a hotel room and I grab my backup microphones from the in space. Yes. Break one of them and put it on there. So this is, I wanna
Alex Volkov
Alex Volkov 15:30
acknowledge this is the fourth AI engineer
15:32
that we cover on ThursdAI. all of them besides the French one, I think. Yeah. so the one in New York, the one in San Francisco. A couple others. and, as always, I wanna acknowledge you gave me the first opportunity as the podcaster, so I really appreciate this. we have leveled up but also AI engineers grown as well. So I want to acknowledge and also ask you about what is this one about? This is the AI Engineer Summit. It's a more selective event. You chose the people. what is this one about? Tell us,
Swyx
Swyx 15:57
so we are starting to theme the summits and you look back at
16:01
2025, what is like the most pm MFE important theme of the year? I had a few candidates. I seriously consider doing voice as a theme. I seriously consider doing RL as a theme because RL obviously is a, very hyped topic, but if I'm really honest with myself, probably code is the most important and serves our audience the best.
Ryan Carson
Ryan Carson 16:20
Yeah.
Swyx
Swyx 16:20
you work at a coding agents company?
16:21
I work at a coding agents company. You guys are very interested in coding agents? Hell yeah. and so I, I think. Probably that was like the right theme to choose for, for this year.
Ryan Carson
Ryan Carson 16:30
Yep.
Swyx
Swyx 16:30
And I think, there's a lot to recap for 2025.
16:32
There's a lot to preview for 2026. Every single model lab is putting out stuff, you know, Anthropic didn't release their thing yet, but, expect them to. it's coming and I think it is the preview of what's to come because a lot of people are also using, for example, cloud code to do non-coding tasks. And so it's, I think it is just really coding agents are just agents. Agents need to be able to code even if they're not specifically for coding. And I think that's something also saw with MCP, like Anthropic is discovering that, well, if you just throw everything in the JSO file, it's not gonna work. you know, to really scale, you'll probably have to write code to operate MCP. Same finding as CloudFlare. And I think that's probably makes sense as well. I think a couple other things that I'll mention for this summit, this is the first time we're focusing on enterprise. For leadership and digital transformation. hearing a lot of Fortune five hundreds, prioritizing that as their top thing for 2026.
Ryan Carson
Ryan Carson 17:20
Yeah.
Swyx
Swyx 17:20
and they're pretty serious.
17:21
So like, McKinsey is here doing their first ever a IE, talking about what they're seeing with their new software division, which is entirely focused on that. And I have a whole bunch of competitors including like Dan Shipper from, every and ai.
Ryan Carson
Ryan Carson 17:32
Yeah.
Swyx
Swyx 17:32
as well as NLW.
17:34
They all have AI consultancies that are making a crap ton of money. our MCP for today is Alex Lieberman, who used to run Morning Brew Which is one of my inspirations For running a newsletter. And a creator business in general. Now he runs 10 x, which is a consulting company that only pays engineers based on story points completed. Whether or not you manually did it or use an agent.
Ryan Carson
Ryan Carson 17:51
That's so interesting.
Swyx
Swyx 17:51
if you're just really damn good at using an agent, right.
17:53
You get a lot of money. They, the, the app, they're gonna pay, multiple people a million dollar, bonuses this year for shipping fast being productive of agents. Right. Wow. and obviously everything has to be like approved and accepted by clients, but there's no limit on your productivity if you're just really good recording agents and see actually that disparity increasing rather than decreasing because everybody has this discussion of like, does AI raise the floor, raise the ceiling? Actually, I think it raises the ceiling much more than raises the floor.
Ryan Carson
Ryan Carson 18:20
Yeah, absolutely.
Swyx
Swyx 18:21
I am no longer running a IE by myself with Ben, we
18:24
actually have a general manager. So when I joined Cognition, I actually stepped back a bit from a IE, and Leah's running things now. So I just get to hang out and vibe with podcasters. Little Leah
Alex Volkov
Alex Volkov 18:33
is a very cool person and I connected with her on the first a IE
18:37
and since then she's like, oh, maybe I'll do one of these or two of these.
Swyx
Swyx 18:39
to run, her own business, selling handbags and I'm like,
18:42
maybe a I is is also on your radar.
Alex Volkov
Alex Volkov 18:45
Alright, Swyx.
18:46
So we have here on the screen the agent labs and the model labs.
Swyx
Swyx 18:50
length sorting?
Alex Volkov
Alex Volkov 18:51
so it's not alphabetical.
18:52
I saw one video here and I was like, yeah, I like the start by length. It's sort by length, not by, but also you do have a call out for some folks who are like three X or four X speakers. Yeah. And some folks who, who got like the top score, on something. Right, exactly. So I
Swyx
Swyx 19:05
wanna encourage my speakers to compete.
19:07
Right? Yeah. because I think people, some people only, they bring out their best when there's a competition, when they can win a, a prize. For World's Fair, we have best speakers. and then for online YouTube, we just have like top speakers. Nice. So anyone who got over 10 K views.
Alex Volkov
Alex Volkov 19:19
so I will just address the audience super quick.
19:21
if you want yours truly to be the top speaker on a i, my judge talk, where I wear a fucking wig for you is on the AE YouTube. Go watch it because it's only like two and a half years you're not allowed to do that. Stop.
Swyx
Swyx 19:33
I'm allowed to do everything.
19:34
I want to hear you. you can show whatever you like.
Alex Volkov
Alex Volkov 19:36
compete.
19:37
So Swyx, we have, I'll actually show this like this. We have. Agent labs and model labs. You wanna call out some of the folks, uh, for whose talk you are looking forward the most? Uh,
Swyx
Swyx 19:46
you asking me to pick my babies as every year?
19:50
No, no. Every, everyone is doing great by the way. Ryan also has a booth talk that Woohoo. He's doing next tOlmorrow. Yes. Nice. so I think that's, that's really cool. Yeah. So I think, basically the, like the thesis of an Agent Lab is emerging where we are slowly converging on like, well. What does it mean to be a GPT rapper? Like do you have a role to play or are you gonna be steamrolled by the model labs? Yes. I think like this emergence of AMP and the emergence of cognition and cursor and all these companies, they're starting to show a path that is like maybe sustainable. And obviously this is like five years behind the roadmap of model Labs, right. But there's a, there's a world in which Agent Labs are at least as well compensated. they are currently well as well compensated as model labs, which is great. It's one of the reasons I joined. But, I think the interesting thing is, well, well are they actually more sustainable? 'cause they have higher margins? Yeah, I think so because they, can jump between the r and d costs of, building your own models versus the open models that are coming from China and like Minimax is here as well. Speaking. Yeah. and just do whatever you want in, in favor of the customer because like everyone is, has this like thesis of like, you have to build vertical AI domain specific startups. Well, if you take that seriously enough, you're going to build an agent lab.
Ryan Carson
Ryan Carson 21:04
Yeah, I totally agree.
Swyx
Swyx 21:05
lawyers, for healthcare, for developers, these are all agent labs.
Alex Volkov
Alex Volkov 21:08
Yep.
21:09
Yep. I don't push back a ton, but I did notice, and I think there's a talk that you asked for, like a spicy take on the stuff. that, that shows that Gemini three beat, on I think a terminal bench with their like, very basic non harness loop beat. Most of the other harnesses with Gemini three. So that's very interesting and it's very interesting. Not nobody caught up yet on terminal bench yet.
Swyx
Swyx 21:34
It's just because, they haven't opened up to the other agents.
Ryan Carson
Ryan Carson 21:36
Yeah, we haven't tested terminal bench yet.
21:38
No one's tried. that's why. We all wanna battle 'em. Trust me. Okay.
Swyx
Swyx 21:41
Terminal bench.
21:42
They, they just did V two. Yes. they're a small team. They're, they're, they're opening up so it's not on there. Not because they, they weren't good, it's just they haven't tried it.
Alex Volkov
Alex Volkov 21:49
Swyx, I just wanna call out the insane amount of folks that
21:52
you brought on here, from all the labs. I'm gonna run through them super quick just so folks understand how much work it goes. I know you're saying you're taking a backseat, but the speakers are my job. There's a few things in the event to do. One of them is the venue and the podcast and running around and the sponsors but also there's the curation of the topics. And this is you.
Swyx
Swyx 22:10
Yeah.
22:11
and so, we had Open invites, the name of invited speakers, open invites. We had 23 applicants to every one slot that we had open. So I'm just gonna read this out.
Alex Volkov
Alex Volkov 22:18
There is Google from Antigravity, Kevin,
22:21
previously on Windsurf. speaker Google is here. amp is here. we have, Ryan and B also from Amp. eMAR, the, our friend from is gonna show up as well. Robert from All Hands Cursor is also here. Representative in Life Code Bench, I think Life Code Bench,
Swyx
Swyx 22:35
people don't understand how important Life Code Bench is.
22:38
Terminal Bench gets all the buzz right now. But actually Life Code Bench is also like a top tier benchmark that people should be aware of. I invited him for Life Code Bench and then I found out that he was joining Cursor. Oh no.
Alex Volkov
Alex Volkov 22:47
just a reminder for, for folks, AI engineer is
22:49
live streamed right now for, will continue to be live streamed. So all of these great talks you can capture on AI engineer and the live stream is right there,
Ryan Carson
Ryan Carson 22:57
is crazy value.
22:57
People should understand. to have that content and this quality for free on YouTube is insane.
Swyx
Swyx 23:02
I mean, there, there's just many people that will never be able
23:04
to serve that are not in the US But if you're in the us you should come.
Alex Volkov
Alex Volkov 23:07
Yeah.
23:08
Google Labs is another one. C is here with JUULs. Yes. Which is another, agent about have.
Swyx
Swyx 23:13
She's actually, she has more broader responsibility other than Jules.
23:15
I know her since she was head of products for herself. and she actually runs product now at Google Labs.
Alex Volkov
Alex Volkov 23:20
about Gimlet Labs.
23:21
but Code Rabbit, I definitely know.
Swyx
Swyx 23:22
Natalie, does AI generated kernels for Apple Metal.
Alex Volkov
Alex Volkov 23:25
Whoa.
Swyx
Swyx 23:26
so if you want serious sort of ml AI engineering, it's her.
Alex Volkov
Alex Volkov 23:30
we have, Versal and Cursor and Factor Inc. Klein and Menace, but
23:33
also like Deep Mind Philanthropic is here. A bunch of Deep Mind is here. Paige is here. Amar, shout out to Paige and Nomar friends. A bunch of Anthropic folks as well. Ammar was the product and design lead for Gemini
Swyx
Swyx 23:43
three.
Alex Volkov
Alex Volkov 23:44
Yes.
23:44
So, he actually
Swyx
Swyx 23:45
was supposed to speak today.
23:46
They had to push it back 'cause they were wrapping up for
Alex Volkov
Alex Volkov 23:48
Thought he was like in the AI studio with Logan.
Swyx
Swyx 23:50
I think they shifted him.
23:51
So, hopefully we have, I'm not exactly sure, but I
Alex Volkov
Alex Volkov 23:53
know he was very, very deeply involved on the launch.
23:55
And you have a representation from an open source, first time library Mini Max for the first time. Yeah.
Swyx
Swyx 23:59
Olive
Alex Volkov
Alex Volkov 24:00
Song is here
Swyx
Swyx 24:01
to talk tOlmorrow.
Alex Volkov
Alex Volkov 24:01
Justin
Swyx
Swyx 24:02
from Qwen
Alex Volkov
Alex Volkov 24:02
last
Swyx
Swyx 24:02
Yeah.
24:03
Online. And then ZI is also speaking online, but this is the first time they can make.
Alex Volkov
Alex Volkov 24:06
In person.
24:06
Same conference. Absolutely. Yeah, go ahead.
Swyx
Swyx 24:09
Yeah, like Capital One, Bloomberg.
Ryan Carson
Ryan Carson 24:11
attendees
Swyx
Swyx 24:11
Atlassian.
24:12
speakers. Yeah, speakers. Attendees. I mean, we try to not share too much because like lots of Fortune 500 in New York. Cool. The whole point of a IE in New York is we bring SF into New York.
Ryan Carson
Ryan Carson 24:21
Yeah.
Swyx
Swyx 24:22
And the startups wanna meet customers, and the model
24:25
has wanna meet customers. But the New York AI scene also is underserved. People just don't have good enough AI events here. N sf like every day there's like some kind of weird It's two a day. but here it is, like we need to bring high quality events to New York. Amen.
Alex Volkov
Alex Volkov 24:36
And the whole point of Thursday, AI is to bring you, well first
24:38
of all, San Francisco to the internet if you can join, but now also New York and, and, we're going global next year. London. Next year. London next year. April eight to 10. Yeah. Nice. We appreciate you joining. We appreciate everything you do, but also we know that you are a busy man and you have to run and we have a, I gotta get rid of every buddy as Yeah. Thanks. Thanks for coming. Thank you so much. Enjoy the rest of the conference folks. Tuning into AI engineer. I forgot to plug, the
Swyx
Swyx 24:59
authors of ICO as well.
25:00
they're speaking. And Stevie Yagi is a personal hero of mine. Gene Kim, I always say is like the evolve Pokemon version of me. Nice. so you should meet all these and see their talks. Cool. Awesome. All right.
Alex Volkov
Alex Volkov 25:09
Good to see you, SWIX.
25:10
Thank you, SWIX. Take care. Alright folks, so we're moving on and we wanna welcome Thor to the stage. As you see, we have like a, a running, a running show here. Toor. Welcome.
Thor Schaeff
Thor Schaeff 25:20
thanks
Alex Volkov
Alex Volkov 25:20
sir. I don't know if I can pronounce your last name.
Thor Schaeff
Thor Schaeff 25:23
Chef.
25:23
Like the chef in the kitchen, chef. Easy. I brought you a little gifts. Oh, nice. Oh, it fits perfectly to your, no way.
Alex Volkov
Alex Volkov 25:30
Wait, wait.
25:30
I want zoom in on this photo then zoom in on this. I have a little gift for me.
Ryan Carson
Ryan Carson 25:34
My, myself and my kids love these.
Alex Volkov
Alex Volkov 25:36
I got a little present from tour.
25:38
you might have to share me plugable banana. And I guess I wonder why, Ryan, let me zoom in on you. And you also got a present. My present's better. What is this high choose. These are flavor. These are candy. Yeah. That is awesome. To thank you, for folks who are not familiar with you, and we, lemme make sure to speak right into that mic for us, for the mic as well. you have just joined, DeepMind.
Thor Schaeff
Thor Schaeff 26:01
Yeah.
26:01
I'm on day three, day three. Oh, boy. Welcome. So, it's, it's been an interesting week. Yeah. Really. So something going on. So, yeah. Tuesday, Gemini three. Yes. Pretty incredible. the reception of the community has been amazing. really cool to see kind of the step up of the coding capabilities. obviously here, AI Code Summit. AI Studio now has vibe coding capabilities, which are, really amazing to see. especially looking at sort of games, vibe, coated games, the design capabilities as well have, Drastically improved. So that was Tuesday, but today is Thursday. Thursday. So today, nano Banana Pro. How's good. which is, well Gemini, three
Ryan Carson
Ryan Carson 26:42
Pro image.
26:42
It's amazing. So we've been raving about it before you hopped on.
Alex Volkov
Alex Volkov 26:46
Alright, so Tor let's talk about some of the releases and
26:48
there has been a bunch and I have like a whole page here of releases. Oh, amazing. Gemini three Pro was released. Yes. Antigravity was released. I wanna talk to you about generative UIs as well, which is crazy. Yeah. Agent mode, which I get access to that can do stuff in my Gmail. anti, undergrad also mentioned, and I think there's something else. what is of all of these things your favorite of the releases, if you can, if you can find about pick
Thor Schaeff
Thor Schaeff 27:11
your favorite kid.
27:12
I think what's probably the most exciting, also seeing that way at the Code Summit, it's just the coding capabilities, and the improvements from Gemini 2.5 to three in terms of the coding capabilities. So it's significant.
Ryan Carson
Ryan Carson 27:25
I don't, I dunno if you know this, but we switched
27:26
to it as our main model in end. Like, it's our default Tuesday stay. Yeah. Nice. I mean, and we'd never do that. Like, weve never switched to a model. Like when GT five came out, didn't pick it. Yeah. But MI three was great. We did hit rate limits, which was kill, like Killer was. And I think this has come back, so
Alex Volkov
Alex Volkov 27:42
Ryan was sitting here in the beginning.
27:44
He is like, and I told him Tour's gonna be here. So to Can you help Brian with his please?
Thor Schaeff
Thor Schaeff 27:48
Yes.
27:49
I hope so. I think there were, yeah. A couple's been solved where running into that. it's been solved, from what I understand, but that's
Ryan Carson
Ryan Carson 27:55
I mean, everybody wanted to use it.
Thor Schaeff
Thor Schaeff 27:56
Yeah.
27:56
Everyone's like running their own benchmarks and evals and, and you know, testing it out, making sure it's like we're on fire probably. It really is.
Alex Volkov
Alex Volkov 28:03
I have a few benchmarks here to pull up.
28:05
Okay. Gemini three with deep think, which is an additional thing, which I would love to hear from you what Deep Think is. received a 45% on Arc AGI two. It's like unprecedented. It's unbelievable. I wanna go and try to actually find this.
Thor Schaeff
Thor Schaeff 28:18
Yeah,
Alex Volkov
Alex Volkov 28:18
I know that you won't be able to tell us how, but like how Thor tell us.
28:21
Come on. Well,
Thor Schaeff
Thor Schaeff 28:22
okay.
28:23
I'm afraid I won't be able to. yeah, I'm still, trying to not say anything stupid. So that I think is my main focus here. I think like if you come to the booth, we actually have, some of the researchers, some of the product managers that works, on these features. So like, you know, I'm just here cheering and, being excited. I can't take any credit for anything. just definitely hoping, I'll ramp up quickly.
Ryan Carson
Ryan Carson 28:46
Well, lemme, lemme call out this moment in time because of that.
28:48
Yeah. Like, so Alex and everybody, remember when we were like, what happened to Google? Like, are are they asleep of the switch? Are they like, what is going on in Google? Yep. And now it feels like the opposite. Like everything is the kind of, and I'm, and I'm not paid by Google. I have no financial interest, otherwise it's probably my 401k. But you know, it, it's everywhere now in a good way. Yeah. Like you said, it's in Gmail, it's in Google Calendar. It's in my, it's in Chrome. You know, it's, it's, it's in amp, like Yeah. It's everywhere.
Thor Schaeff
Thor Schaeff 29:19
Yeah.
29:19
no, it seems, like a really big step up. Yeah, everyone's very excited, which is great. I think my team specifically, we're working across kind of the Gemini API and the AI studio.
Ryan Carson
Ryan Carson 29:30
Yeah.
Thor Schaeff
Thor Schaeff 29:30
so like we've been very excited to see what
29:33
people are building in AI studio. if you go to AI Dev, I actually recently found out that, is a domain that we have. Yeah. I love AI Dev. I use it a lot. It's my favorite all the time. So, AI dev and, you can look into the gallery, you can look at some of the apps that people have been building with, AI Studio. And so we're really, you know, kind of going all in on the vibe coding, how basically empowering, everyone to sort of build their prototypes, their applications, internally at Google as well.
Ryan Carson
Ryan Carson 30:03
doesn't do any deployment.
Thor Schaeff
Thor Schaeff 30:04
correct.
30:04
Probably. Yes. So not, the full stack deployments, just Yes. Right. So that's something that we're working on to really give you the end to end. You build it, you deploy it. So watch
Ryan Carson
Ryan Carson 30:13
Watch out Rept.
Thor Schaeff
Thor Schaeff 30:14
Well, I mean,
Ryan Carson
Ryan Carson 30:15
which is fine.
Thor Schaeff
Thor Schaeff 30:16
so I think it's definitely like we're trying to build kind of the
30:18
best tools for developers You know, obviously if that fits into your workflow where you prototype and ship small apps to production, I think as well for internal tooling, I heard people are using it internally, to build internal tooling applications, prototype things. so, I think we're very excited about the model, but also just kind of all the, the tooling for developers that has being built around it and, and now obviously developers, you know, or builders like. There's a lot more developers now coming, you know, online. they haven't
Alex Volkov
Alex Volkov 30:51
I think this is the call out, like a bunch of agent stuff has happened.
30:54
I wanna call out, like we're showing on stage right now. The, vending bench too. vending bench is these like awesome guys that come, I love this benchmark. They put like a mini fridge in your, in your foundational lab somewhere, and then they restock it and the model is in charge of like sending them an email and saying, Hey, the employees want this or that, or this or that. G Knight three on vending bench went off the charts, man. It's just like, absolutely. They made a ton of money. It made $5,000. I think the second one of this was lots on at 4.5 with like a little bit less than three. so vending me, they consider this like long horizon running task because the model has to have a lot of stuff in its memory. It has to pick up a lot of stuff. So this considered a long horizon one. so it's really good at this. Gemini is also I think to me Gemini is the leading multimodal model as well. And that has been also improved. we called out some of the evals on the show before, but, almost 90, MU pro and I, I need to find the exact metrics evals there. those are absolutely crazy. And the thing that I love about this is because of the vibe coding, paradigm, we'll talk about antigravity in a bit. antigravity records itself and then Gemini three sees that recording and acts based on the video. It's the loop that
Ryan Carson
Ryan Carson 32:01
But I do, so my, my one beef and I, I, yes please, you probably
32:04
can't control this, but you have to have a Gmail address to use anti-gravity. I can't use my Google workspace address. And it's like, oh, interesting. Everybody in the world has a Google Workspace address. Like, so it's like, I think you're cutting out everybody.
Thor Schaeff
Thor Schaeff 32:18
Yeah.
32:18
I mean, yeah, to be fair, it came out, on Tuesday, so I think we
Ryan Carson
Ryan Carson 32:22
were like, come on, work harder.
Thor Schaeff
Thor Schaeff 32:24
you don't have any Gmail address at all?
Ryan Carson
Ryan Carson 32:26
So this is one of these stupid things I do, but I couldn't log
32:29
in because it was already addressed, attached to a different phone number.
Thor Schaeff
Thor Schaeff 32:33
factor
Ryan Carson
Ryan Carson 32:33
authentication
Thor Schaeff
Thor Schaeff 32:34
crap.
Ryan Carson
Ryan Carson 32:34
okay.
Thor Schaeff
Thor Schaeff 32:35
good one.
32:36
You're like, you don't get right on Thatt, know what's wrong with you.
Alex Volkov
Alex Volkov 32:39
so Fox can try out Gemini three.
32:40
I think one more call out that we have to do on the show is that it landed everywhere across Google as well, AI mode. So shout out to Robbie Stein, who's been on the podcast a couple times. talk about AI mode. The agent search fan out searching Google Gemini three landed there for the first time as the model the East. It was quite incredible. generative UIs. I still haven't seen those generative UIs. Have you seen those generative UIs yet? Or no? Internally? in the internal ones where Gemini answers, not with text, but the answers with widgets that you can control.
Thor Schaeff
Thor Schaeff 33:09
Yeah.
33:09
Oh yeah. Yes. I think, that's a good question. I mean,
Ryan Carson
Ryan Carson 33:11
so I'm using generative ai, components in my, but most of them are
33:15
like tool calls, and then you, populate a component with the results, which is pretty standard, but it'll be exciting to see truly generative components.
Alex Volkov
Alex Volkov 33:23
and they're incorporated into Gemini and into AI mode.
33:26
I think I've seen them once, like it's very pretty cool when you ask you to like, plan a mode and Okay, we're getting to, deep think and deeps think is dedicated reasoning. have you played a lot with deeps syn at all? What do you guys have to say about deeps think Besides I, I've not played a lot with it, honestly.
Thor Schaeff
Thor Schaeff 33:40
I'm trying to get through my onboarding task, but
33:42
Des just keeps launching things. I'm like, come on, stop doing that. Yeah.
Alex Volkov
Alex Volkov 33:47
And then okays talk about antigravity.
33:49
Antigravity is an agent ID that was released. I play with it. I found it like super, super cool. Yeah. I actually wanted to show some examples of this on stage here. Oh, cool. Because of the few things that it has that's different than any other like agent id, with the, agents mailbox basically where you can talk to agents and it's powered by Gini three, but also supports, other models as well.
Thor Schaeff
Thor Schaeff 34:09
in there I think clause
Alex Volkov
Alex Volkov 34:11
Yeah.
34:11
I think we said GPT, OSS you pointed directly at Dominic, which is over there from OpenAI. Who are, we're gonna chat to next that I was in charge of part of the, Yeah. Unfortunately that we can like accommodate one more guest. So we'll give you like five more minutes. We're really good for five. Alright. Five more minutes. I'm sweating. and that's it. So antigravity, I saw this take online that why does Google have too many, tools? And I have like my own take on this, but I would love to hear from you, like, what, what, what do you think? there's Jewels. And Karich is talking here. there is, Gemini C Line Gini, CLI em, ICLI. Which is probably more for like, and now there's there's, there was project like DXI don't know, I know who that is. Yeah. That's just gone Fire Race Studio. Fire Race Studio. And now we have, antigravity, which is a hard name also to get used to.
Ryan Carson
Ryan Carson 34:53
feels like Jules and Antigravity, like direct competitors,
34:55
I mean they're for the same thing. Cloud
Alex Volkov
Alex Volkov 34:56
id.
Thor Schaeff
Thor Schaeff 34:57
Is local id
Ryan Carson
Ryan Carson 34:57
Do you think it's gonna work?
34:58
I mean, local cloud, but it's an IDE.
Thor Schaeff
Thor Schaeff 35:01
Yeah, no, it's definitely interesting.
35:02
it's definitely also something I'm still wrapping my head around. I think the way I like to think about it is kind of like the VC ecosystem as well. Like VCs are investing in All these competitive things, hoping that one of them succeeds. and I think that is pretty much where, Google is at, there's a lot of. Different efforts and, some of them might succeed, some won't, But I think what we're seeing now is that a good amount of them are, pretty successful. And so we're, just doubling down on that. And, yeah, I think pretty exciting for developers. Pretty exciting times to see, all these tools. I think what was really great with the Gemini three launch as well, kind of seeing all the partner integrations and kind of all these tools to basically be there for launch day and like, you know, have sort of Gemini three ready and available for users. And, yeah, it's been, I think a pretty, pretty successful launch. So I can't take any credit. I'm still trying to new guy to set up my, you know, hr, tasks. But, yeah, it's been really exciting to see what the team is sprinting on. And I mean, I've seen kind of what's coming down the pipeline. Give us a hint. well, I mean, Christmas will be delightful. Oh, man. I was,
Alex Volkov
Alex Volkov 36:13
I was hoping to take a vacation on Christmas.
36:15
Folks, I wanna show you antigravity super quick, because I think it's important enough to show the main differences. So Antigravity has a few features that I think are new and kind of interesting novel. one of them is artifacts. The artifacts feature. Basically, the model knows that when they, when you ask you to plan something, the model knows to show you a to-do list. And the to-do list, it checks off the boxes and it does this not in the file. So it has like a task list. No, but why You don't wanna talk to it? No, I'm afraid it's gonna like ruin the, the thing. Yeah. And it's already downloading another update that just updated this morning. So the folks are really working on this. Yeah. And you can see a task list here. This is my leaderboard for, Weights, & Biases. for example, and I just asked it to do some stuff, so it planned for itself. It gave me a to do the cool thing about this, you can review the plan and give it instructions right here. Yeah. Or you can comment on any specific to-do or a bunch of to-dos and give it the comment like here. And then with this comment, the to do plan will change before it even starts generating. Oh, cool. I found that. That's super cool. Wow. And this is an artifact. This is not a file. It's like, a built in first party, built in thing, but primitive of the system. Cool. And the other one is walkthrough. So the, once the model starts doing something, it will generate this file, which is a walkthrough file, and it'll show you the implementation walkthrough of the changes it made.
Ryan Carson
Ryan Carson 37:27
Oh, so it's like a summary of what it's been doing?
37:29
Yes.
Alex Volkov
Alex Volkov 37:30
And because it has an integration with Chrome, and this
37:32
is one other thing I wanna call out. It has a deep integration with browser. it's gonna open the browser right now with a Chrome extension. And then because it has a deep integration with Chrome via the anti-gravity browser control It can take screenshots and videos.
Ryan Carson
Ryan Carson 37:45
Yeah.
Alex Volkov
Alex Volkov 37:45
It takes screenshots and videos.
37:46
That's very good. And then it puts you in this workflow, it just puts you like, Hey, I did this thing, and it's like right here.
Ryan Carson
Ryan Carson 37:51
a feedback loop
Alex Volkov
Alex Volkov 37:52
For the agent.
37:53
Yeah. And then the last thing, and I think the most important thing, and this is the reason why folks I chose Antigravity as my number one update this, this week is because the agent manager is a completely new beast. Yeah. And, we are getting a few folks leading the stage, so we're gonna get a little bit noisy here. So let me get the microphones up a little bit. Check, check. But folks, this is the Antigo agent manager. And, now, basically what this means is that I no longer need to think in folders, concepts, in files, concepts. I have all my workspaces, which is folders in everyone. I have a conversation. So I have this like add GM three pro eval to my evals. And then the eval before this, I generated thenar chart Nisten. If you remember, we did this together. Thenar chart, generated here. I also built this nano banana test where I had some access before and like I built this ui. And so when I have agents running on multiple things, I have this inbox and they basically tell me, Hey, the next step for you to do is to open the walkthrough for nano banana or the mini chart. You can open this here. I'm stuck in a, A CLI thing that I want you to accept. Basically all these agents can run independently of me and I'm like managing them. Yeah. So basically this agent manager Makes you a pm. It's your job. It's you're, you're a PM now Evasions versus just like a cold
Thor Schaeff
Thor Schaeff 39:08
editor that work alongside.
39:09
It's clearly the future. You're an engineering manager now. Yes.
Alex Volkov
Alex Volkov 39:12
I'll call out one last thing about anti-gravity, which is
39:14
really hard for me to remember off the top of my head why they called it this. There's a playground feature. and I think it's one of my favorite features here. Why? Because if I wanna just test something out, like, hey, generate a 3D scene of a piano with clickable keys that I can play with my keyboard. If I wanna just do that, I don't have to open the folder. I can just go into playground, hit this and it will just work. So it's not tied to a workspace. Yeah. Wow. the innovation behind this, it opens the folder in the dedicated space on my behalf Gallery doesn't matter, but like, the fact that it's here, that is cool. It's great.
Ryan Carson
Ryan Carson 39:50
That is, that's nice.
Alex Volkov
Alex Volkov 39:52
Alright, so this has been the update from, from Google and DeepMind.
39:55
Folks, we're gonna stop sharing here super quick. We're gonna show you the results of our anti-gravity, response. Dora, thank you so much for coming on the show. Thank you. We really appreciate it. We drew in deep water day three of your work. I could ask you some stuff about your first job. You worked in 11 labs for a while and, and pour some other stuff as well. Yes. Incredible
Ryan Carson
Ryan Carson 40:11
company as well.
Alex Volkov
Alex Volkov 40:12
incredible company.
40:12
We love 11 labs. We just had, you connected me with I forgot the name. Paul. Yes, Paul. Paul was on the show. He's amazing. he was great. we're gonna move on to open the eye now. Thor, thank you so, so much for coming. Appreciate Sam Thor a follow. And while we bring on Dominic, Wolfram, I would love to hear from you to give your thoughts about Gemini while we bring on Dominic. We're gonna go off.
Wolfram Ravenwolf
Wolfram Ravenwolf 40:34
Yeah.
40:34
Yam can, also add something about the Gemini Sea Alliance on the next step. So what I, yeah, this
Alex Volkov
Alex Volkov 40:40
Go ahead, Wolfram.
Wolfram Ravenwolf
Wolfram Ravenwolf 40:41
Okay.
40:41
Yeah, so I've been using, Gini since it came out directly, for creative writing cast and comparing it, and it's my favorite model now has replaced on it 4.5, which was in Germany, especially the best model so far. And, Gemini did make even less mistakes in the German language and in general, writing was top notch, right? Compared to, for instance, 4.1. It is so much better in the writing department. it's become my go-to model, my main model now. And interestingly, the system prompts I've been using with all these models, it is a smarter model, but it's appeared not to be instruction following as closely because it seemed to also have some kind of intuition.
Ryan Carson
Ryan Carson 41:23
Sorry.
Wolfram Ravenwolf
Wolfram Ravenwolf 41:24
It seems to understand the intention of your prompt and
41:27
go by the intention instead of just sticking to the literal prompt as we're. Which was also very interesting, experience. And so I think they have really delivered here. It's my favorite model now. And when I want to test the model, I always say evaluation, evaluation, evaluation. But for your personal use case, just use it for a while. Make it your main model for just a day. Use it and you'll see what happens. I will be using this one as my main model now.
Ryan Carson
Ryan Carson 41:54
Yeah.
Yam Peleg
Yam Peleg 41:54
So about Gemini, I maxed out my, rate limits twice,
41:58
yesterday and today on Gemini. outta the box. The model is a beast. it's different. I'm using it on a Linux-based machine so it can control the system itself. I've seen it do things that other models didn't come up with. It could solve bugs that I didn't see any other model being able to solve. I compared, I gave stuff to Gemini stuff to, to other models and saw, to, to see what's going on. I want to compare it to Codex Max, which also is extremely powerful. Codex has always been a truly powerful model. I just wanna say, Gemini on the CLI is a great, great, great model. it's really good also for brainstorming, through the CLI. Yeah, you can just talk through the CLI like brainstorming what you want to develop human, John, right? Yeah, absolutely. Yeah, it's a great, absolutely delivered. Completely, completely agree with Will from they absolutely delivered. rate limits are, a bit of a pain, but understandable. Everyone is saying it's absolutely, they absolutely delivered mean that everyone is using it. So I kind get the rate limits, so yeah.
Alex Volkov
Alex Volkov 43:06
Alrighty.
43:06
thank you guys for chiming in. And now we're gonna go and introduce Dominic. Dominic, correct?
Ryan Carson
Ryan Carson 43:11
Yeah.
Alex Volkov
Alex Volkov 43:11
Yes.
43:12
Dominic Kundle Dom is, on OpenAI Could you introduce yourself and closer to the mic because there's some background noise. Yeah, introduce yourself. Tell us who you are. This is your first time on the podcast, so welcome. We really appreciate you. Yeah, thanks for having me. I really appreciate you being spontaneous and jumping on. the way I convinced him though, I'll tell you, is I told him that Thor gonna come and Deep Minds gonna be overrepresented on the podcast and shooting OpenAI Also be here. Gotta jump in. I thought it
Dominik Kundel
Dominik Kundel 43:33
was just that we're doing a German AI corner with th First and me.
Alex Volkov
Alex Volkov 43:37
Yes, we could do that too.
43:38
we probably could do this with some AI help, so please introduce yourself to the audience. This is your camera, and, tell folks who you are and, then we're gonna talk about some of the stuff that OpenAI release this week.
Dominik Kundel
Dominik Kundel 43:47
Hi everyone.
43:47
My name is Dominic k Kle. I'm on the developer experience team at OpenAI. So working both with the product teams and the, developer community on how to get the most out of open AI's products.
Alex Volkov
Alex Volkov 43:57
All right.
43:57
So do, the first question I have for you, I'm gonna throw you into Deepwater, is could you guys not like chill this week? Like, what's up? Like, really this is a difficult week for all of us who follow along, the curve of AI developments, what, why No, I'm just kidding. Obviously, why, but like, what is the, can you give us the, the high level summary of the releases that OpenAI decided to cap the week with?
Dominik Kundel
Dominik Kundel 44:20
I mean, after last week we had GPT 5.1, and GPT 5.1 Codex.
44:23
So this week we released, GPT 5.1 Codex Max. Yeah. and then also, we try to calm you all down to not make it a big deal, but like we updated GPT five, pro to GT 5.1 Pro.
Alex Volkov
Alex Volkov 44:35
Yes.
44:35
Wow. And I heard from, a a guest of ours here on the show. Dr. This is like, he has access to 5.1 Pro and he absolutely loved this like, research grade intelligence. honestly, it's been a crazy week. I haven't had time to play with the five pro myself. I think I did one run and it gave me an insane answer. I don't even know if I have the capability to differentiate between 5.1, which is really good. And 5.1 Pro. That one is not on the API yet.
Dominik Kundel
Dominik Kundel 45:00
No, it's not.
45:00
On the a also the technically 5.1 Codex Max is not in the API yet. It's only if you sign into Chat GPT, into Codex with Chat GPT. Got it. Oh, okay. So it's Codex available by api. Got it. It's gonna come into the API sooner
Ryan Carson
Ryan Carson 45:16
So does Codex choose when to use the MAX model versus
Dominik Kundel
Dominik Kundel 45:19
No, it's the default now.
45:20
It's the default. So if you have the latest version of Codex, it's gonna be the default. Yeah. It sounds
Ryan Carson
Ryan Carson 45:24
expensive.
45:25
So I thought, oh, this is most expensive. No,
Dominik Kundel
Dominik Kundel 45:26
the model we focus on giving you almost the best of both worlds.
45:31
The model is both more capable, but it's also faster and more cost efficient. Wow. How does that work? We made it the default.
Alex Volkov
Alex Volkov 45:37
Cool.
45:37
Love it. OpenAI is not great at naming, so really saying OpenAI GPT 5.1 Codex Max. Yes. Extra high. Think extra high. Extra high. Yes. so if, if there's an opportunity for me to give some feedback, the naming is not amazing, but the model, you don't want to double down on that. I, you probably will. but the model is insane from what I see and specifically because of the compacting thing. Yeah. So I, I just wanna tell folks, when the model was released, the kind of the, the autonomy and the long horizon running task capabilities of this model are not just a straight line continuation into like, okay, we've doubled the context window. I think the context model is still, in the 1 million, area as far as, I forgot what it is for that model at least. but because of something that it does is compact, which I'll put you on the spot to those. what that is about based on the actual release notes, it's running for like 24 hour plus task, which is if you follow the METR, kind of like long horizon agent things. It's crazy. It's just like out distribution, continuation of the previous model. Yeah. 24 hours was not in the realm of things. And I chatted with the open eye, person that told me that like it could run for way, way longer than this. also because of this compacting thing. So just for folks who are listening who have no idea what I'm talking about, models have context windows, basically how much memory they have to be able to process some things. Yep. and contact windows, the more they get filled, the less the model, especially in the gen model that helps you code is performance because it kind of like starts losing facts. It's short term memory. Short term memory. Exactly. But now this is a compaction, compaction thing. it's basically what, can you tell us like what that is? Yeah, how it work?
Dominik Kundel
Dominik Kundel 47:10
Yeah, I mean we already had like compaction in Codex, that you
47:15
could like either manually trigger or if it will get to a certain threshold.
Ryan Carson
Ryan Carson 47:18
But that's more like just start new thread
47:20
you or you're compacting go. No, it would still
Dominik Kundel
Dominik Kundel 47:22
So for those old models, you can see it in the, codex code base,
47:25
Codex is open source, written in rust. You can look under the hood if you want to check anything out. the big thing is that, we wanted to make sure that the model is really good at dealing with that compaction and can work on these long running tasks our goal with Codex, is that we want it to be a software engineer that works on your team that you can trust with hard tasks and hard tasks can take a long time. Yeah. and so we really made sure that the model can, deal with that compaction, in the long run.
Alex Volkov
Alex Volkov 47:54
Compaction, as, as just the way to explain this, basically the model
47:59
takes whatever was previously in the conversation tree And removes anything that's like irrelevant, like tool calling that not relevant, maybe just the results. Something like this, right?
Dominik Kundel
Dominik Kundel 48:07
Can't, I can't get into like the details of what that, can
Alex Volkov
Alex Volkov 48:10
you get into the details of like the open source, codex?
48:12
Yeah, how it did compaction with two one, so
Dominik Kundel
Dominik Kundel 48:14
that one for example, like it did a summary
48:16
basically on the past conversation.
Ryan Carson
Ryan Carson 48:17
what you're saying is it happens at the model level.
48:19
yeah, so honestly,
Dominik Kundel
Dominik Kundel 48:20
I don't
Alex Volkov
Alex Volkov 48:20
know, but yeah, I can probably say that this is what I heard
48:23
as well, that, and this is, some of this is results as well, that, GPD five one Codex Max was trained with native compaction understanding. Okay. what this reminded me of is that, back when MCP was just getting started, Other models were not good at understanding cp, doing MCP and cloud was really good because they trained
Ryan Carson
Ryan Carson 48:40
Yeah.
Alex Volkov
Alex Volkov 48:40
Cloud to be native in MCP and, and so the reasoning
48:44
didn't take as much, right? So the, the reasoning process didn't take as much, to get to the same level. Yeah, that makes sense. And so it feels like when you natively train the model with RL, which everybody uses for everything, and I know open air, like folks sometimes don't like to talk about their rl, but RL is there. the, it just becomes better at this task. and so we see here efficiency, like 30% fewer thinking tokens compared to predecessors in median.
Dominik Kundel
Dominik Kundel 49:05
Yeah.
49:06
That was the other part, right? Like we know that like the people always want faster and faster,
Ryan Carson
Ryan Carson 49:12
models.
Dominik Kundel
Dominik Kundel 49:13
And so like a big focus was that like the model is
49:16
actually more efficient in its sta so like if you're looking at the Groks that we published in the blog post, you can actually see that like, even for the same levels of reasoning, like the model got significantly more efficient in using, using like fewer tokens to get like better results. and so I think that's like one of the biggest standouts and why we made it the default. So it's not maxed in the sense it's like the most expensive. Right, right. but it's actually better. and it's not just better in the like general sense. One of the big things that we've heard is, on Windows, like the model struggled with like things like PowerShell, for example. Yep, yep. so we actually made the model better at knowing PowerShell Windows shell commands. Nice. And we released an experimental sandbox in Windows now, so Hopefully the experience is gonna be better.
Ryan Carson
Ryan Carson 50:02
When do you think you'll ship the model in the API?
Dominik Kundel
Dominik Kundel 50:04
I don't have an exact date, but we're trying to
50:06
get it out as soon as we can.
Alex Volkov
Alex Volkov 50:07
Thanks.
50:07
Ryan is asking, because I'm assuming the AM would like to test it. We wanna try, come
Dominik Kundel
Dominik Kundel 50:11
on.
Alex Volkov
Alex Volkov 50:11
Where is this available right now?
50:12
It's available in, codex, in Code and the Codex ID extension as well, right?
Dominik Kundel
Dominik Kundel 50:17
in the ID extension, the CLI and the web
50:19
cloud tasks if you're using those.
Alex Volkov
Alex Volkov 50:21
Yeah, we were talking about, with, with to before in Google,
50:24
we're talking about the different offerings for Gemini CLI as a CLI tool, and then JUULs as the coding tool and and then now enter Gravity as the id. they don't think they have an extension, but I'm not sure.
Ryan Carson
Ryan Carson 50:35
Yeah.
Alex Volkov
Alex Volkov 50:35
and we're like, why?
50:36
There's like so many we didn't ask them. And Codex is all that. Also those products were under one name. Yes. So there's a Codex Cloud thing where you can just like go and hand off agent task for cloud. Yeah. There's a Codex CLI extension, which is like just Codex. And then the, you guys don't have a ID yet. but there is a ID extension, which is essentially a sidebar extension, like line a rule. Those for like BS code or whatever for every VS. Code out there. Yes. Yeah. Yeah. So Codex is like all that together and, this new model is available there. Correct. Tell me about high and extra High. What's up?
Dominik Kundel
Dominik Kundel 51:07
so we introduce extra high as like generally our recommendation
51:12
is to use the medium level, for most of your tasks, it can deal with very complex tasks even on medium. Cool. we had a high reasoning level already, but we figured there were tasks where for your really hardest tasks Sometimes you really don't care about how long it takes. You just want to crunch it out and you want results.
Ryan Carson
Ryan Carson 51:32
Yeah.
Dominik Kundel
Dominik Kundel 51:33
And so that's what high, extra high is really for.
Alex Volkov
Alex Volkov 51:36
I'm gonna try to add the idea with the screen here to show the folks.
51:39
so we have, incredible results on terminal bench. Terminal bench two just launched, with, five one Codex Max. I don't know Sure. If this is higher or extra high. 58% on terminal bench. Terminal bench launched. I think last week? Yeah. This week was insane to me. So I don't know if it was last week or two weeks ago, but we talked about terminal bench just launching and the state of the art when they just launched was 50%. Yeah. And I think this was warp using GPT five. Right. and then like jumps to 58. I think this is now state of the art on, on terminal bench. I think so. And then, we, we like this, this is the extra high part here. So we're showing also in Swyx bench verified, comparison to GB D five one Codex Codex Max is basically cheaper and faster looks like and more accurate across all the endpoints. So the loan ones are just run for less tokens with higher accuracy. the medium one, pretty much the same. Accuracy, maybe a little bit higher. Yeah. But significantly less tokens. So the medium was what? 14,000 tokens. And, the medium for max is 9,000 tokens. And then the high is like significantly high in score. While also being less tokens.
Dominik Kundel
Dominik Kundel 52:43
And one of the things that we still have as part of this is still
52:46
the thing we introduced to, like GPT five Codex origin, which is adaptive reasoning. so like, even though if you're setting it to like medium, that doesn't necessarily mean it's like overthinking complex, questions. It can actually still adjust for like simple questions, respond quickly. And like I've caught myself with this a couple of times where I'm asking Codex to do something and I think it's gonna take a while and I turn around and it's already done. So like, if you've used Codex before you should really feel a significant
Ryan Carson
Ryan Carson 53:13
speed up.
53:13
So is adaptive reasoning like the same model or do you do a quick call out to like a really fast model to decide if it's a complex task?
Dominik Kundel
Dominik Kundel 53:20
Interesting.
53:20
Wow.
Alex Volkov
Alex Volkov 53:21
Cool.
53:22
So, anything else we didn't cover? I would love to talk to you about five 5.1 Pro and I have this question for you. Basically, how the hell am I supposed to tell if it's better than five Pro? Like, how do you guys have, like I'm assuming there's evals, in place, but who's the target audience for this and how can they tell if it's actually better for them to wait these like 5, 10, 17 minutes? That pro can go on like a long tangent.
Dominik Kundel
Dominik Kundel 53:45
I think it's a good question for like everyone sort of
53:47
has a different threshold, right? Like, I've loved using 5.1, pro for the tasks that are really getting me to like, I can't get the other models to like,
Ryan Carson
Ryan Carson 53:58
yeah, give
Dominik Kundel
Dominik Kundel 53:58
So like, I don't necessarily default to 5.1.
54:01
in Chat GPT you also have the option to like, when you like trigger 5.1 Pro and it like, takes so long, just like go skip it. but like the, I try to not resort to it immediately. And like, I dunno if you've seen like the Oracle tool, That has been like circling around Twitter where, I think Peter Steinberg is the one who built it, but like he, basically built a little CLI tool that he can give to like Codex, for example, to like pop open and Chat GPT in the browser, sent off the task to pro. and then get the result back and seemingly he uses that when Codex reaches its limits. so people have really been mind blown for tasks that they didn't think a model could do, could solve it. I think Theo posted about it, where he gave, 5 1, 1 Pro puzzle from Defcon. it solved that. So I think I have
Alex Volkov
Alex Volkov 54:51
So, to summarize, the releases GPT 5.1 Codex Max with high
54:55
and extra, high abilities of thinking. API is not there yet, but that's coming. But right now folks can just like use it in Codex and ID extensions as well. And, PRO is available. Pro 5.1 Pro is available to pro users in, in Chat GPT as well. And that's like a very long horizon running, tab. And I think LDJ is calling out that group chats are also rolling out. Oh yeah, we launched. first of group chats. Like what's, what, what, what's going on? Like why in what, in what way will we use group chats?
Dominik Kundel
Dominik Kundel 55:24
I think like the
Alex Volkov
Alex Volkov 55:25
newish feature, right?
55:26
It's just rolling out,
Dominik Kundel
Dominik Kundel 55:27
it just came out earlier.
Ryan Carson
Ryan Carson 55:28
so like this is for a multi-person chat with one ai.
Dominik Kundel
Dominik Kundel 55:31
Correct.
55:31
So like you can bring your friends together and still use Chat GPT. That's smart. For example, planning a party You've had sort of the, I don't know, I've certainly had these situations where you're discussing with your team or with other friends, like a topic and one of you goes back and starts, says,
Ryan Carson
Ryan Carson 55:47
so do you tag it?
Dominik Kundel
Dominik Kundel 55:48
to, you can just ask like, Chat GPT, can you do this?
Alex Volkov
Alex Volkov 55:50
Is agent mode in there?
55:52
And by the way, this is breaking news folks. This is rolling out from what? I was
Dominik Kundel
Dominik Kundel 55:55
gonna say.
Alex Volkov
Alex Volkov 55:55
yeah, didn't even,
Dominik Kundel
Dominik Kundel 55:56
catch you by surprise.
55:57
it came like 40 minutes ago. So breaking news. I don't know if it is, if agent mode is in there.
Alex Volkov
Alex Volkov 56:02
okay, so we need to test out agent mode.
56:04
'cause like if you wanted to give a task, it needs to go and do some tasks. Well, that'd be kind of cool. Yeah. Wow. Okay, so we should probably open the group chat, all of us and try it out, for Thursday as well. Cool. Dominic, thank you so much for coming up. Thanks for having, I really appreciate your ability and like, of just jumping on. You're welcome. Be grilled here. And we should mention also that, you, are one of the folks who work on the GPT OS release. Yeah, I do. And we cover GPT OSS. We also support GPT OSS and the WEB inference. And it's a great model and not only the testament of how great this model is, the anti-gravity, IDE that they just released from Google. This is the only open source model that they support. We don't support any other model GPT OS 120 B in addition to course, I think, sorry to Summit 4.5 and to their own model. So that's like super, super cool. Thank, congrats to you on that release. Thank, we really, really appreciate it. Like very happy to have you on the show and feel free to come back when you guys, do more releases all in. Yeah. Thanks for having me. Alright folks, this has been Dominic Kdo from Open the Eye and we are back and we're gonna bring back Wolfram and LDJ and Yum. And see if you guys have comments on, GPT 5.1 Codex Max. it's a lot to say. You can't use this in Amp yet. No. 'cause they just said they don't have an API, but Wolfram. LDJ Ya. Did you guys have a chance to test it out?
Yam Peleg
Yam Peleg 57:21
Oh yeah, You definitely feel a difference?
57:23
Absolutely. from the first prompt, you feel a difference? It's better. It just understands. it's hard to explain what exactly is better, but you feel it immediately. Immediately. It just gets wow. If you have a vibe, you kind of understand how the agents work, you get a vibe that, okay, that's gonna be hard for the agent. Yeah. All of a sudden it's just being sold and much faster than you think. the previous codex, the first one in my opinion, was incredible from the ground up for writing code, but the first one was not talking to you. You explained something and just went on to do the thing. Sometimes you try to ask it, what is the best approach to do one, two, or three and you expect an answer and it just ignores it and starts to do it on its own. usually it also gets the code down, but it's not exactly what you wanted. 5.1 solved this, definitely you can speak to the model and it responds back. It's much more human, humanlike. It's not human. It's humanlike human. yeah. It's not your girlfriend. Yeah. And the new one, I didn't get a lot of time to test it. Obviously it was released yesterday, but, but yeah, you definitely feel the difference. definitely people talk about compacting and so on. The model is so efficient. I don't remember too many times I got to a point where I needed compacting, unlike, other, CLI tools, not mentioning names that are grinding through the entire context in five minutes. So, yeah, definitely go check it out. And 5.1 Pro is a beast,
Alex Volkov
Alex Volkov 58:51
I think compacting, compacting comes into play on very
58:53
Long Horizon task, and I chatted with an OpenAI person here, during the CI engineer talk, and he was basically saying, Hey, I ran my, like an earlier checkpoint of whatever, obviously the internal one for a week on and off on my laptop and off of one prompt. And the exit criteria was, Hey, you should get better, like, at this task or that task. And I'm just like, mind blown that we're at the place where they're announcing 24 hour runs, but we're getting a week worth of an agent run for, a model. This is mind blowing.
Thor Schaeff
Thor Schaeff 59:23
to imagine
Ryan Carson
Ryan Carson 59:24
what kind of task is needed for a week.
59:26
it's a lot of compute.
Yam Peleg
Yam Peleg 59:27
I just wanna say, I can confirm that I saw a couple of hours myself.
59:31
I'm not affiliated with OpenAI. I saw a couple of hours myself on the previous version. it's definitely real and Yeah. It can power through pretty much anything. You throw it. Yeah.
Ryan Carson
Ryan Carson 59:41
What was the task that you gave it that was doing that?
Yam Peleg
Yam Peleg 59:45
actually an open research question.
59:47
Like I'm doing experiments, machine learning and so on, and it's an open research. You got experiments results and you want to, you know, go and experiment the next session of experiments where you're going to, how are you going to understand the research question that you are into? And it's completely open. There is, the result is not online. And it just went power through for like two, three hours. And I thought it's going to crash. I thought the terminal is frozen or something. I see that, you know, that the ticking timer just goes up and up and up. And it came up with an answer and I saw, wow. I went online, read papers, checked my code, checked the results. It actually did work for Wow. Like heavy lifting work for two, three hours. It was pretty incredible to see. Mm-hmm.
Alex Volkov
Alex Volkov 1:00:33
So, folks, I wanna summarize what we just covered on the show so far.
1:00:37
obviously we had, Dominic, who's on Open the Eye working on the Open OS model, and he talked to us about the Codex and Codex Snacks and the GPT Five Pro stuff. before this we had Thor from, DeepMind who just joined, but basically helped us cover the Gemini three release. And we also barely, like, well, we covered a little bit of anti-gravity, their agent release, Gemini. the Google released a bunch of other stuff as well. We have to talk about Nana Banana Pro. oh. Also we had Swyx and we covered the engineer. I'm like, this is a big, a big show. But the week began, the week began with, with the folks from Grok and, the, yeah, why am I, I can't switch cameras for something. Okay, that's fine. the week began with, with folks from grok, just absolutely releasing the top intelligence for one day. It seems pretty good. I don't remember any other one day top release from before. It didn't even come up with an API, but Grok 4.1 is the new release from, from the folks at at, at XI and it's really, really, really good. And so they actually followed up with another release just yesterday. With Grok 4.1 Fast. Yep. And the gen tool capabilities. Let's take a look at this, because I think, I think it's like, it's, it's really, really worth taking a look at, like how, how, how, how the risk was, was posted. So gimme a second, folks to pull up this here. gr Lemme see X ai, I'll pull up the, the kind, the official thing. It's easier. Yeah. I will ask this before, folks, do you guys use rock at all for anything? And if not, why not? I'll go first. I tried, tried. Oh,
Ryan Carson
Ryan Carson 1:02:10
Go ahead.
Yam Peleg
Yam Peleg 1:02:11
Yeah, I tried it today and I must say that, look, the
1:02:15
search, the X search is great. it can also search Reddit, by the way, which, oh, many other models, refuse to do. But the results that at least I got we're good, but just not mind blowing as you think you would get when you see the benchmarks.
Alex Volkov
Alex Volkov 1:02:32
So again, two models, Grok 4.1, which is kind of the main one.
1:02:37
They did a thing that I didn't appreciate and I think I called it out last time, because they just kinda updated it in place on the GR thing. And then they just showed that people prefer the previous gr I like checkpoints, I like to be able to know which model I'm talking about. But like it gets 64% win rate against the previous one. And I think we kind of covered this, already a little bit. And then for the, for one day Grok 4.1, which is a 0.1 release, not a full like 0.5 release. Right. Or like a three release from Gemini. it topped El Marina at 1400 ELO score. Wow. So 1483 for Grok four, 4.1 thinking. And, 1465 for Grok 4.1, which is basically the top of El Marina, for a day. Because then Gemini came out with, the first, I think 1500 above, you know, score. and Gemini three just kind of dominates this. But I think, I think that this is not. The whole story of Grok. And I wanna pull out kind of the, the actual tweet because grok, 4.1 fast, this is the one that, that most people can use in API. And I don't know if you guys used it already or not. this was released, this was released 18 hours ago, folks, so we are breaking, breaking new ground, the breaking news here, 4.1 Fast. And Agent Tools, API, I honestly haven't had a chance to like play with this yet as well. basically their best tool called Model with 2 million complex window. 2 million.
Ryan Carson
Ryan Carson 1:03:53
Yes, I mean it, I'm so curious what the effective number
1:03:57
is though, because we all know as Sonet four, five, it supposedly a million, but really it's 200 to 300 K. Yep. So, I mean, what if it's a million that's effective on Brock 4.1?
Alex Volkov
Alex Volkov 1:04:08
I actually wanna see if they posted the notes about, the ruler
1:04:11
benchmark I think they use for this. and do we know the cost? Yes. we're gonna take a look at this or 2 million, agent tools, API, which gives agents access to realtime X data, web search, remote code execution, and more. This is the T bench. this is a, agent, independent evaluation from a professional analysis rock 4.1 Fast is an insane, insane performance here with a cheap, with a very cheap total cost of $5 and 55 cents. And overall accuracy of 93%, which beats Gemini three Pro that just released. And 10 x is the price for $5 versus 57, dollars. And the accuracy is also higher. Beats J 5.1, beats 4.5 sonnet and beats the previous block.
Ryan Carson
Ryan Carson 1:04:56
Can you summarize tile bench?
1:04:58
Like how does it work basically? T two bench.
Alex Volkov
Alex Volkov 1:05:00
Yeah, go ahead.
1:05:00
So there's,
LDJ
LDJ 1:05:01
yeah, so there's multiple sections.
1:05:02
There's a airline section and a retail section and it's meant to be somewhat computer use, but kind of a multi hop reasoning, a gentech benchmark for doing things like acting as an employee that would help book airline tickets or doing things that would be similar as customer service for a retail environment and simulated things like that. Got it.
Alex Volkov
Alex Volkov 1:05:23
That's super cool.
1:05:23
Thank you LDJ. And then, the performance on this is like absolutely crazy. state-of-the-art tool calling, so we talked about the Gorilla Benchmark. The Berkeley function calling before Benchmark. Gemini three Pro we just released again is, around, you know, the total cost for this is like $234 with 63% accuracy. Rock for fast reasoning is what, like 10 x the price, it's almost hard to believe's 10 x cheaper with 69% accuracy. So the accuracy how to be real is really, really high. And I think, wow, I think X is really cooked with this one. And I think that all of the labs have cooked this week and everyone, it's an amazing week. Everyone cooked in like a different, a completely different thing. So, multiterm lung context is also high on GBD 4.1, fast. but I think the highlight, and I think that I wanted to, to to talk about this before on the show is the tool calling and the specific tools that X AI has access to because, previously I said maybe for many labs, rock is maybe not the best choice because of how it's uncensored, right? Blah, blah, blah. Yeah. But the fact that it has access to X right. Means so much. Right? And the agent goes and does real time research on x and X is, with all the bad issues, about X is also probably the top most place for real time for news understanding of the world for news. And just the, the, the amount of stuff that, the, the guac can deliver based on the school calling right, is just absolutely incredible.
Ryan Carson
Ryan Carson 1:06:50
Yeah.
1:06:51
I mean, I think all of us that are building agents are gonna have to take it seriously.
Alex Volkov
Alex Volkov 1:06:55
Yeah.
1:06:56
And I think you guys just now have access to an a PII haven't, I haven't done testing for it yet, but definitely something that I plan to do. Yeah. After the show and like I should give it a try. anything else on the gr release folks that you wanted to cover? the release agent tool set, search tools file. Okay. So let's, let's maybe cover the tools that they have super quick.
Ryan Carson
Ryan Carson 1:07:16
I'm curious how good their web browsing tool is.
Alex Volkov
Alex Volkov 1:07:18
I think.
1:07:18
Yeah. You mentioned they also have Reddit search, which is I think an agreement that they have with Reddit that many other labs don't have. Right?
Yam Peleg
Yam Peleg 1:07:26
I don't know about the legal aspect of this, but I can tell
1:07:28
you, for sure they have Reddit search. The model goes on its own to search on Reddit. Other models outright refuse to do this. even if you insist, it's very hard to get them to search Reddit. Crock just do, its on its own and also it can search X and X is a valuable source of information.
Alex Volkov
Alex Volkov 1:07:47
For this, can I tell you guys a story about the
1:07:50
ex search that that happened? So basically I'm using this in my agent to bring me some more information about any release, right? So I basically grab a tweet and then I expand the suite and I was like, Hey, go search about, folks who are talking about this release, maybe not mentioning the exact name, et cetera. And what happens is it brings me a bunch of ai influencer threads that have no more information than the actual release. And I'm looking for the researchers. I'm looking for folks who say the training. I'm looking for these things. And unfortunately I have to mute some folks who you over use the world, world wild. Like it literally, if it uses the world wild, I don't want it in my extra summary for x. All right folks. so we are back to, I think we've covered most of the main releases.
Ryan Carson
Ryan Carson 1:08:30
Yep.
Alex Volkov
Alex Volkov 1:08:30
let's just take a bit here and think about, do you guys
1:08:33
remember ever this insane of a week? I can't breathe. What the fuck? Every major lab, and we haven't even covered meta yet, and we're gonna talk about Sam three and 3D in a second. And Google decided to say, Hey, we know how this game is played, so we're gonna announce something, it's gonna leak. All of those people were gonna leak this, open the eyes, gonna release something before or after to try to capture the media attention and then we're gonna release something again. So they, they came today with non of Banana Pro, which was insane. Also, we're gonna take a look at this as well. And that's busy,
Ryan Carson
Ryan Carson 1:09:09
and we all have to build on top of this stuff.
Yam Peleg
Yam Peleg 1:09:11
What did you all think accelerating means?
1:09:15
Yes. If not that, what did you all think it means?
Alex Volkov
Alex Volkov 1:09:19
Wolfman was wearing his Christmas hat because this
1:09:22
is the AI Christmas, although I think the actual, the actual AI Christmas is, is coming soon as well. So I think that, you know, we're gonna have that as well. alright folks, what's next on our agenda? short this week's buzz. Let me award from our sponsor super quick and then we're gonna talk about, don't go anywhere because we're gonna talk about Sam we're gonna show you a visual example of this and we're gonna play with Nano Banana as well. Alright, this week's buzz and we'll back to it. You all.
1:10:05
alrighty folks, welcome to this speaks bosman name, Alex Ko and the Ivans with Weights, & Biases, who is the sole sponsor of, ThursdAI podcast. as always, we release a bunch of stuff every week, and I think I already told you about Merriman Notebooks. Merriman Notebooks is a company that we've acquired under the Weights, & Biases Umbrella. And this week Al had a, an update. So I wanted to briefly tell you about this update as well. Marmo is a company that builds ai, Python notebooks for ai specifically, they're executable is script. You can deploy them as apps and you conversion them with key. These are like the three things that Python notebooks don't usually have, and Marmo is kind of the future Python notebook. So shout out to Marmo for this announcement. And, for this, week, the update is we have a native vs code and cursor extensions with reactive notebooks and UV powered reproducible environment. If you've done any Python development and you haven't switched to uv, you know that. The place UV is how to live your life as a Python developer and brings a native, integration into, into first server vs. Code. And also, uv. So this is your small update from, from us. And then also there's a native, vs code GitHub copilot working inside the notebook. If you ever try to do a, a Python notebook, you know that until like Cursor brings a support for this, the ai, they lose their shit. They don't know which sell to execute. They dunno if the sales are executable. They try to do the D tool. The digital tool doesn't work like it's a mess.
Ryan Carson
Ryan Carson 1:11:31
this
Alex Volkov
Alex Volkov 1:11:31
is a really big deal for somebody who wants to like,
1:11:33
use notebooks, smart notebooks like Marimo, but also like, work with ai. So this is a shout out, for Marmo team. if you are here at AI engineer, with Gimme one second. speaking of, let me just turn around. So you see the logo of Weights, & Biases, representing Weights, & Biases. We are here at the AI engineer. We're a gold sponsor of this event, a gold sponsor of the AI engineer event. if you are listening to this. Alright folks, what is the next important thing that we have to talk about? think everybody's interested in Nano Banana right.
Ryan Carson
Ryan Carson 1:12:04
Let's do it.
1:12:04
it's visual, it's fun. Let's do it.
Alex Volkov
Alex Volkov 1:12:06
At the end of the podcast, we're almost two hours in.
1:12:09
how Wolfram how are we doing? How, how, how much of a score would you give us in terms of covering everything? Yamas, Jack Red Bull, because he's gonna go all amazing that we covered so much. Yes. Including three guests on the show. yeah, we haven't given you like all the details like we usually do and we haven't covered any open source LDJ.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:12:27
while you were gone, we covered the almost release the new
LDJ
LDJ 1:12:30
Yeah.
1:12:31
Did you guys hear about Olmo three?
Yam Peleg
Yam Peleg 1:12:33
Oh yeah.
Nisten
Nisten 1:12:35
It is the biggest 32 B dataset tutorial training.
1:12:41
And the model's not bad either.
LDJ
LDJ 1:12:43
Yeah, it looks to me right now to be around the, 32 B thinking
1:12:48
performance of the Qwen 32 B models, but simultaneously it's completely open. I don't think qwen or deep seeq, although they are doing great work, they've ever actually put out a fully open recipe. data set, full training recipe, hyper parameters, everything a hundred percent open. And that's what Allen Institute of AI is driving to do with models like Olmo three. So really cool that we have model of that level of quality that's fully open like that.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:13:15
Yeah.
1:13:15
Big round of applause applause for being really open source and not just open Whites models. And yeah, this is a big release, for one thing, fully open release and another one that we finally get some sizes that we can run locally again. After all the good models that have been much, much bigger. this is not MOE and it's a good size, I think, to run locally.
LDJ
LDJ 1:13:38
Yeah.
1:13:38
I think it's inevitable that they will start putting out, MOE versions, you know, to compete with the, I think, Qwen version. It's like, three B, active or 30 B total. Right.
Nisten
Nisten 1:13:50
Wait, this is dense.
1:13:51
I'm looking at the,
LDJ
LDJ 1:13:53
yeah, Olmo is dense, so I'm saying feature, 'cause Quinn's
1:13:56
current, most popular MOE it's like three B active 30 B total. Right.
Nisten
Nisten 1:14:02
it's a big deal.
1:14:03
Uh, also the, the medical performance was not bad. Uh, it, it was not like the leading edge, but it was also still pretty good. Um, I, I think we will be quite interesting to try and match a data set against the distribution of the pre-training data now that you finally have it for such a big model.
Alex Volkov
Alex Volkov 1:14:22
That's great.
1:14:22
Okay. Folks, in breaking news from this morning, And, let's do some Nana Banana stuff. So Nana Banana Pro was released in, in, Yeah. Nana Banana Pro, came out. Nana Banana is a Gemini's image editing tool and his has been upgrade with thinking capabilities. Which means that this you can, talk to it and you will, think about the stuff that you said and then we'll generate incredible, incredible Graphics. This one supports resolution of up to 4K. So it's available in AI studio. It's available inside Gemini as well. And I think just the demos will be what we want. Right? So previously on the show, we took a screenshot of all of us and then we asked the GPT model to give us a, let me find the copy. I remember this.
Ryan Carson
Ryan Carson 1:15:09
I remember watching you guys.
Alex Volkov
Alex Volkov 1:15:10
Let's go to I Dev.
1:15:11
Hopefully I'm logged in on this browser. This is the new AI studio by the way. And you can see that this AI studio, the playground is just the playground, but now it's like Gentech and you can build stuff, whatever. Oh wow. So we have Nano Banana pro. And you need to link your API key 'cause it's paid. You can upload an image and say, lemme just look at the editing capabilities of this model. Remove the live streaming stuff and keep just the faces of the people and even remove their names. we're gonna try out this live. you can see this model starts and then it shows the thinking traces here, which I think is super cool.
Ryan Carson
Ryan Carson 1:15:47
That's great.
1:15:47
It's not a,
Alex Volkov
Alex Volkov 1:15:47
it's not just a model that's like, just tried to generate.
1:15:50
And I will show you like another thing that I generated. Wow. I mean, look at this folks. Oh, wow. I just want you to like see that this is the exact same image. What the heck Without the, let me pin the, the wow. The other one super quick.
Ryan Carson
Ryan Carson 1:16:06
I think the thinking traces are cool.
1:16:07
Yeah. Like we've never seen that with, with, image generation before.
Alex Volkov
Alex Volkov 1:16:10
here, here is the original image and here
1:16:13
is the image that it gave us. Right. And you can see like it removed the logo for Thursday. I, Ryan is smiling, like Ryan is smiling. It removed Alex Wilco that I, Yum. It remove, removed all of the live stuff, all of the, the, the view. It's just like, incredible. Jesus, please.
Ryan Carson
Ryan Carson 1:16:28
That's amazing.
Alex Volkov
Alex Volkov 1:16:29
But this is not all it can do.
1:16:31
So apparently this model has support for up to six people that it can generate their faces and then up to
Yam Peleg
Yam Peleg 1:16:37
put all of us in with you in the context.
1:16:40
Oh, yeah.
Alex Volkov
Alex Volkov 1:16:42
that's a good prop.
1:16:44
yeah. Wolfram, to the.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:16:48
been doing just like that and it's just
1:16:50
pasted the people on top.
Alex Volkov
Alex Volkov 1:16:51
So Fox four, just listening.
1:16:53
basically it's a picture of a four by four square of all of us, me and Ryan sitting on the top right square. And the prompt that I gave it is not a simple prompt from really image models. I said add yam and Wolfram to rectangle with Ryan and Alex. And, hopefully this will work. Let's see.
Ryan Carson
Ryan Carson 1:17:08
The think and traces.
Alex Volkov
Alex Volkov 1:17:09
All right.
1:17:10
Oh, it added yum, but not wolf room. So it says, I'm currently working on combining different panels to create a single image. My initial thought to use image one, blah, blah, blah. it did something, it took Yam and it added the yam with us. And this kind of works, but it's not bad. It's not bad. Not bad. but it also replaced LDJ with another image of us, but also wolf firm.
Ryan Carson
Ryan Carson 1:17:32
like
Alex Volkov
Alex Volkov 1:17:32
Yeah.
1:17:32
And this is the resolutions. One K. One other thing that this does, let's actually try something else. It has access to Google search. It has grounding. So if you knew, wow, you open a new chat, you can ask it for like, Infographics for like very specific things. and here is an insane prompt that Chat GPT came with. That the che came with based on the notes for this week's support. Wow. So design, I'm not gonna read all of this. I'm gonna scroll through this super quick. It says Design the highend vertical prOlmo Infographic poster, for a tech podcast episode. The episode is style Thursday eye and, blah blah, blah, blah. Style, polished, and then composition at the top. Center place, the main text in bold. Then, beneath that, a tiny line. And then in the center of the post, okay, a large glowing circular AI core symbol. And then include those panels. Google panel x, AI panel. The prompt adherence is absolutely crazy. So open the AI panel and the meta, and then robotics panel as well. this prompt was given to me by GPT five Pro, I think.
Ryan Carson
Ryan Carson 1:18:31
Okay.
Alex Volkov
Alex Volkov 1:18:32
And when I hit run, and you can see this prompt is like insane.
1:18:35
it has a bunch of stuff. This model will start thinking through all of this. The last time that I did this, this took 80 seconds. And the result of this was just like absolutely incredible.
Ryan Carson
Ryan Carson 1:18:46
Do you, what, what does temperature like, I know it does, but what
1:18:49
effect does it have on image generation?
Alex Volkov
Alex Volkov 1:18:51
I think It already generated for 25 seconds.
1:18:53
I wanted more. So let's take a look. okay, so this is what they generated. This was fairly quick. The weekday labs blinked. Here's the Nana Banana Pro, the Google Gemini, and the gravity. And you can see like the text is not like super, super sharp. And we can go and upload the resolution to 4K, Got another prompt.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:19:09
But do you realize what's going on here?
1:19:11
We have a image model that is thinking, you see it in the top right corner. Nana Banana Pro is actually Gemini throw pro image preview. So they put the Nana banana on top of Gemini, all the text that is Gemini and the images, that is Nana Banana.
Alex Volkov
Alex Volkov 1:19:26
Yeah.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:19:26
Yeah.
Alex Volkov
Alex Volkov 1:19:27
That's, I mean, people are gonna have so much fun with this.
1:19:30
I think the four key resolution is the biggest thing.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:19:33
actually put a picture in and have it analyze the
1:19:36
picture and Gemini will output this. So for instance, you don't even have to output images. You can talk to it about a picture. Gemini is there,
Alex Volkov
Alex Volkov 1:19:44
the thing that I noticed as well is that because it's like native
1:19:47
4K resolution, this is an upscale. Then you can just like give it another picture. It's not 4K and say, I'll put this exact picture with 4K and we'll just like upscale it. Fail with no loss of detail. It's crazy.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:00
one thing we should show that we are in AI studio, or in the
1:20:03
Gemini app, they have this new, dynamic pain view that is just being released. That is also super impressive.
Alex Volkov
Alex Volkov 1:20:10
try
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:11
Go the Gemini app.
Alex Volkov
Alex Volkov 1:20:13
Yeah.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:13
Then, just go to, the plus at the bottom,
1:20:17
where you can pick, create images. There should be a dynamic view.
Alex Volkov
Alex Volkov 1:20:21
Oh, I don't think, do I have creative?
1:20:23
Yeah, I have creative images. and dynamic view
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:24
I don't have
Alex Volkov
Alex Volkov 1:20:25
a dynamic view.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:27
You don't have it No.
Alex Volkov
Alex Volkov 1:20:29
Below
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:29
canvas and so on.
Alex Volkov
Alex Volkov 1:20:31
You wanna show this?
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:33
that has been the most impressive thing I've seen in a while.
Alex Volkov
Alex Volkov 1:20:36
Sure.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20:36
Okay.
1:20:37
Let me just go to Gemini. I will share it. I will tell you when it's, ready to be shown.
Alex Volkov
Alex Volkov 1:20:44
Alright.
1:20:44
we're gonna wait for, for Wolf I will just shout out that like this. non Banana Pro is available across everything across, it's available to AI studio. you do have to pay for this. This is not look free completely. Right. The Gemini Ultra, I think subscriptions and some other folks. let's see, what else am I missing here from my notes about Nano Banana Pro? it is absolutely insane, but I have notes for this game. I can't wait to go home and play with that. Yes. Oh, oh, oh. The most important thing is, since id, this model is now watermarked and they have a tool that you can upload images to their, nana banana like interface. Yeah. somewhere and ask whether or not this image was altered or generated with Gemini's thing. It's called syn id. it's basically a safety thing. and it's invisible.
Ryan Carson
Ryan Carson 1:21:27
human eye, basically.
Alex Volkov
Alex Volkov 1:21:28
completely invisible.
1:21:29
And then the Gemini app will verify if those images Google generated and, AI is generated. Wow. Wolfram, you ready for a demo? Yeah, I'm
Wolfram Ravenwolf
Wolfram Ravenwolf 1:21:36
ready.
1:21:37
yeah, what I did is just enable in here the dynamic view. That it's a new feature and I can say, is this a generative
Alex Volkov
Alex Volkov 1:21:44
ui?
Wolfram Ravenwolf
Wolfram Ravenwolf 1:21:45
Yeah.
1:21:46
It generates, now it's, what do you want to see Something about Thursday something about the latest AI news?
Alex Volkov
Alex Volkov 1:21:52
let's do four panels of the top AI news from last week.
1:21:56
Let's do that.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:21:59
So let's just do it
Alex Volkov
Alex Volkov 1:22:00
Okay.
1:22:01
So folks, we're just listening. We're seeing Wolf's Gemini screen and he has a, dynamic view, and I'm pretty sure the dynamic view is this, generative UI thing that they released for some reason I don't have access. basically Gemini will generate, if I'm not mistaken, this is what it is. Gemini will generate mini apps, to show you the responses of its search inside Google. is that it Wolfram? Is this the thing? Yeah.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:22:22
Yeah.
1:22:22
That's what display Google. And it's
Alex Volkov
Alex Volkov 1:22:23
crazy is because we're in the area of like generative UIs.
1:22:26
Every UI that you have seen until today was created by a designer and crafted by a user somewhere. And this is like on the fly, just in time UIs.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:22:36
Yeah.
1:22:37
We're
Alex Volkov
Alex Volkov 1:22:37
waiting for this.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:22:38
It's building the AI Nexus.
Alex Volkov
Alex Volkov 1:22:40
AI Nexus.
1:22:41
I like it.
Yam Peleg
Yam Peleg 1:22:42
always call it Nexus, all the models.
Alex Volkov
Alex Volkov 1:22:44
So
1:22:45
folks who are listening, this is basically the model vibe coding.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:22:49
wow.
Alex Volkov
Alex Volkov 1:22:50
every request that we have, it builds like a website.
1:22:52
Yeah. It's a live UI with a live ui. This is not a picture. Everybody with the leaderboard and market reaction, what the, what the hell are we looking at? It's insane.
Ryan Carson
Ryan Carson 1:23:02
and what is the circling purple bar?
1:23:04
Is that it still generating or is it saying this is, it's still generating,
Wolfram Ravenwolf
Wolfram Ravenwolf 1:23:07
so I can't click yet on these, but when it's done,
1:23:10
it should be fully interactive.
LDJ
LDJ 1:23:12
This is actually really curious where it says market reactions.
1:23:15
Is that real data? Like, let's fact check that real quick. keep going down.
Alex Volkov
Alex Volkov 1:23:18
Scroll down a little bit.
1:23:19
Scroll down. So folks, were just listening. We're seeing, basically a website that was generated with the, with the top menu bar and models and the market reaction, and then it put like Nvidia earnings beat.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:23:31
dynamic.
1:23:31
It should pull the actual data. I don't think it is, hot coded in the page sheet.
Alex Volkov
Alex Volkov 1:23:35
you open the prioritizing development strategy?
1:23:38
Oh, it's finished.
LDJ
LDJ 1:23:39
Look at this.
1:23:39
This is real. Listen. So probably, four hours ago it was reported that Nvidia was 5.2% in pre-market trading. And if you scroll down on this website, literally said 5.2% on an Nvidia. Wow. That's
Alex Volkov
Alex Volkov 1:23:53
amazing.
1:23:54
This is insane.
LDJ
LDJ 1:23:55
what, sorry, bottle.
1:23:57
What
Alex Volkov
Alex Volkov 1:23:57
happened?
1:23:57
It has a running thing on top of like the new thing. It talked about GBT 5.1, Gemini three, and 4.5, which was recent in September. which is not exactly last week, but that's fine. Can you click the policy and business tabs as well? Oh, it has a full article about every model.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:24:16
Oh, it's,
Alex Volkov
Alex Volkov 1:24:16
analysis.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:24:16
when Click stor, click in a new tab.
Ryan Carson
Ryan Carson 1:24:19
Yeah.
1:24:20
It has really
Wolfram Ravenwolf
Wolfram Ravenwolf 1:24:21
links spec.
Ryan Carson
Ryan Carson 1:24:22
It's like, is the data in a JS file?
1:24:25
Or like, what, where are these posts coming from? So I'm like, my product model, this is.
Alex Volkov
Alex Volkov 1:24:31
This is insane.
1:24:32
Wow. This is insane. This is the dynamic ui. Like it builds UI on the top. Well, from my agree with you, this could be like a life-changing thing where like, we're gonna move in the area of like, just in time UIs and, and images. . But yeah, definitely Google, despite everybody else's attempt, I think Google is the winner this week. Wow. With all the releases. Oh, yeah. Folks. We're about two and something hours in. I really wanna show you, Sam, before we go do it. So let's take a look at Sam before we go. So here's the extent of how good this model is. There's a viral video from Sora that, cat is playing like different instruments on the porch. And then the woman comes out and says, don't play this. They just segmented the cat. And you can see how insane this is. That is because like, this is an AI generated video fully and only the cat is segmented and the woman comes out and you can see the label here. Cat playing piano. Yeah. This is a like a live text label that people just type cat playing the pipes. You can see only the cat is this is, that's Insane. It's so good. Okay, so I wanna like show one more example, but I want us to like go to the actual playground.
Ryan Carson
Ryan Carson 1:25:34
Okay.
1:25:34
Do it.
Alex Volkov
Alex Volkov 1:25:35
em three stands for segment Anything model.
1:25:37
And, the playground is absolutely bonkers. And, meta release segment anything, model and segment, anything. Model 3D. so let's take a look at, oh, playground. Yes. Let's take a look at the playground. Try it. It's just absolutely mind blowing. So you can do video cutouts. This, so you see a video of dogs and you can see there's some white dogs and some Labradors. We can do it. they're cute dogs. Yes. golden retriever. So now we got only the golden retriever, but the trick is me and Ryan, we already know there's multiple other golden retrievers that are gonna come into the scene, watch this and segment anything. Three is a video segmentation model and it's gonna search. Are all of them golden retrievers? Man? Our previous demo worked better, Ryan. Yeah. Darn it. but you can see like as new dogs come into the scene, it segments, the new dogs in the video. That's so great. previously we had white dogs and white dogs work way better. So you can see now it's only the white dogs and, we're gonna search the entire video. You can see the little dog coming in behind the scenes there. And that dog, astonishing dog is also segmented.
Ryan Carson
Ryan Carson 1:26:45
I mean, think about what humans are gonna do with this model.
1:26:48
Like, there's so many cool things you can do. Yeah. Very cool.
Alex Volkov
Alex Volkov 1:26:51
I like the little white dog that comes in there
1:26:54
behind the scenes is crazy. Yeah. That's cool. That's amazing.
Ryan Carson
Ryan Carson 1:26:56
Alrightyy.
1:26:57
So, do we have time to do the 3D thing? Yes.
Alex Volkov
Alex Volkov 1:26:59
3D thing and then we'll finish.
Ryan Carson
Ryan Carson 1:27:00
I have to make sure we're still sharing our screen.
1:27:02
No, we're good.
Alex Volkov
Alex Volkov 1:27:02
Can you guys see the screen?
Ryan Carson
Ryan Carson 1:27:04
Yes.
Alex Volkov
Alex Volkov 1:27:05
Alright folks.
1:27:06
Should we land this plane? Should we try to land this plane? I think so. look, I'm not even gonna do a recap. There's like so much, but basically we have three guests on the show, Phil, the a GI field the acceleration. We had three guests on the show. We had Twix, we had Dominic Kdu from OpenAI, and we had Thor sch from DeepMind and Gem. Ryan Carson and I are the AI engineer. We talked to you about Gemini three Pro and Antigravity. The agent ID we covered, we covered, meta Sam and SAM 3D that we just showed you. We covered x ais, GR 4.1 and 4.1 agent tools and the X search. API as well. We covered nano banana at length with syn id and we even talked about Mary Mo with native vs. Code extensions. We didn't even mention the huge deal that Microsoft Nvidia Yeah. with Tropic for like $14 billion because that in a week like this seems like not really like huge news. But we really appreciate your time. We wanna shout out Swyx for the support in the engineer here as well. Yes, Ryan, thank you for joining. Thank you guys for joining and everybody, live that joins us, the, just on the live stream. I'm really hoping that you guys see the value of ThursdAI every week because weeks like this, it's really hard to keep up with everything. And, we're here to make sure that you don't get behind. this weeks like this is really difficult as well. and I wanna
Ryan Carson
Ryan Carson 1:28:17
thank you.
1:28:17
So, like everybody watching Alex works his tail off to do this. I wanna thank you. You really deserve it, man. Like you make this thing happen.
Alex Volkov
Alex Volkov 1:28:25
Appreciate you guys.
1:28:26
Thank you so much and appreciate everybody who tuned in, who commented, if you missed any part of the show. the other, part that I do after this is I edit it out to a podcast and a newsletter. So that's gonna come out. Hopefully today. I haven't missed one Thursday even though today is gonna be difficult, I will use ai. I'm not gonna lie to you, to help me do this job. subscribe to Thursday, AI everywhere you get your podcast. If you missed it, newsletter and a podcast, and we'll see you next week. Probably not live we'll probably from, our, our own studio with a bunch of other news as well. Thank you so much. folks, we're gonna conclude the stream for today. It was incredible. Ryan, thank you so much for joining, man. Amazing, amazing. Thank you, mark for helping us behind the scenes here. And, thanks for your engineers and everybody who joined. Bye-bye everyone. Cheers. See you. Bye-bye.