Episode Summary

The ThursdAI crew reunites for their legendary annual tradition β€” a quarter-by-quarter, month-by-month review of every major AI release of 2025. Alex is joined by all five co-hosts plus Kwindla Hultman Kramer (Daily.co) to relive a year that started with DeepSeek crashing NVIDIA's stock and ended with Gemini 3 reclaiming the #1 benchmark spot. Across 110 minutes, they cover 50+ releases: the rise of reasoning models, vibe coding going mainstream, Chinese labs dominating open source, Claude Code launching the CLI agent era, and the jaw-dropping moment someone trained an LLM in space. This is the definitive record of AI's most acceleration-packed year ever.

Hosts & Guests

Alex Volkov
Alex Volkov
Host Β· W&B / CoreWeave
@altryne
Kwindla Hultman Kramer
Kwindla Hultman Kramer
Co-Founder & CEO Β· Daily.co
@kwindla
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator
@WolframRvnwlf
Yam Peleg
Yam Peleg
AI builder & founder
@Yampeleg
LDJ
LDJ
Nous Research
@ldjconfirmed
Nisten Tahiraj
Nisten Tahiraj
AI operator & builder
@nisten
Ryan Carson
Ryan Carson
AI educator & founder
@ryancarson

By The Numbers

Episodes in 2025
52
ThursdAI covered every week of AI in 2025 β€” this is episode 52, the yearly recap
NVIDIA stock loss
$560B
Largest single-company monetary loss in history when DeepSeek R1 dropped January 23rd
DeepSeek R1 training cost
$5.5M
The alleged training cost that sparked massive debate and matched OpenAI's o1 at 50x cheaper pricing
Project Stargate
$500B
The Manhattan Project for AI infrastructure announced in January β€” numbers stopped making sense
HLE (Humanity's Last Exam)
26.6%
OpenAI Deep Research score in February, vs 10% for o1/R1 β€” a jaw-dropping leap in research capability
ARC-AGI 2
#1
Gemini 3 Pro Deep Think mode reclaimed the top spot on ARC-AGI 2 in November

πŸ”₯ Breaking During The Show

Gemini 3 Pro Deep Think Reclaims #1 on ARC-AGI 2
In November, Google's Gemini 3 Pro with Deep Think mode took back the top spot on ARC-AGI 2 β€” a major moment that validated Google's comeback narrative after years of being the 'catching up' lab.
Someone Actually Trained an LLM in Space This Year
Perhaps 2025's wildest headline: an LLM was trained in space. The numbers and scale of AI investment in 2025 stopped making sense in the best possible way.

πŸŽ™οΈ Intro & Team's Biggest Releases of the Year

Alex welcomes the full ThursdAI crew for the 52nd and final episode of 2025. Before diving into the quarter-by-quarter recap, each co-host and guest shares their personal pick for the single most impactful AI release of the year β€” Claude Code, native image generation, Opus 4.5, and paradigm shifts in image AI all get nominations.

  • Yam: Claude Code / CLI agents as the year's defining release
  • Wolfram: paradigm shift in image generation
  • Ryan: Opus 4.5 β€” 700+ days of daily coding with LLMs
  • Kwindla: Claude Code proving 'the harness matters as much as the model'
Kwindla H Kramer
Kwindla H Kramer
"Claude code, just proving yeah, the models really matter, but sometimes it's about the harness and maybe mostly it's about the harness."
Ryan Carson
Ryan Carson
"Opus four, five. it is unbelievable. I've been coding with an LLM for over 700 days now, like every day, and this is the best."

πŸŒ‹ Q1: January β€” DeepSeek Shakes the World

The earthquake that shattered assumptions about who leads AI. DeepSeek R1 dropped January 23rd, crashed NVIDIA stock 17% ($560B loss β€” the largest single-company monetary loss in history), matched OpenAI's o1 at 50x cheaper pricing, and made even grandmothers aware of Chinese AI. OpenAI Operator launched browser-based agents, Project Stargate committed $500B to AI infrastructure, and Kokoro TTS went viral.

  • DeepSeek R1: crashed NVIDIA 17%, matched o1, allegedly cost $5.5M to train
  • OpenAI Operator: first agentic ChatGPT with browser control
  • Project Stargate: $500B infrastructure β€” the Manhattan Project for AI
  • NVIDIA Project Digits: $3,000 desktop running 200B parameter models
  • Kokoro TTS: 82M param model hit #1 TTS Arena, Apache 2, runs in browser
  • MiniMax-01: 4M context window from Hailuo
Alex Volkov
Alex Volkov
"My mom knows about DeepSeek β€” your grandma probably knows about it, too."
LDJ
LDJ
"It came out within 24 hours of DeepSeek, I think it was the day of, and DeepSeek absolutely overshadowed everything."

🧠 Q1: February β€” Reasoning Mania & The Birth of Vibe Coding

The month that redefined how we work with AI. OpenAI Deep Research scored 26.6% on Humanity's Last Exam (vs 10% for o1/R1). Andrej Karpathy coined 'vibe coding' in early February, and it immediately reshaped the entire developer ecosystem. Claude Code launched as an internal Anthropic tool and began the CLI agent revolution. OpenAI's naming chaos continued with two separate model lines.

  • OpenAI Deep Research: 26.6% HLE score β€” agentic research breakthrough
  • Andrej Karpathy coins 'vibe coding' β€” term less than a year old, already everywhere
  • Claude Code launches β€” built internally at Anthropic, becomes defining release
  • OpenAI naming chaos: two parallel model lines cause mass confusion
Alex Volkov
Alex Volkov
"Can you imagine that the term Vibe Coding is less than 1 year old? That Claude Code was released at the start of THIS year?"
Yam Peleg
Yam Peleg
"They studied many times that they used this. They started this, as a tool for kind of internal engineering. And it's very great system prompt. That's pretty much it β€” small details making it run smoothly."

πŸ”Œ Q1: March β€” MCP Becomes the Universal Protocol

March saw MCP (Model Context Protocol) win the integration wars and become the de facto standard for connecting AI agents to tools. OpenAI released two voice models derived from GPT Realtime, Qwen launched speech-to-speech capabilities, and Gemini 2.5 briefly claimed the top benchmark spot. Cursor sales exploded on the back of Claude 3.7 and vibe coding mania.

  • MCP wins as universal standard for agent-tool integration
  • OpenAI's new voice models: GPT Realtime speech-to-speech derivatives
  • Qwen launches speech-to-speech model with internal emotion handling
  • Gemini 2.5 takes #1 benchmark briefly
  • Cursor sales explode β€” Claude 3.7 + vibe coding = perfect storm
  • OpenAI no longer the undisputed leader β€” inflection point
Nisten Tahiraj
Nisten Tahiraj
"March, we got the, let's just start with the important one, MCP one as a standard model context protocol. And, they took it from just one, just Claude having it to open standard."
Wolfram Ravenwolf
Wolfram Ravenwolf
"I think that it was also the pivotal moment in history where we noticed that OpenAI is not the big leader all the time anymore."

πŸš€ Q2: April–June β€” VEO3, Claude Opus 4, and the Thinking Machines

Q2 was the quarter voice agents went mainstream beyond the AI bubble. April brought ChatGPT memory and agent-to-agent protocols. May delivered GPT-4o native image generation and Ghibli-mania. June saw Claude Opus 4 drop (Ryan's pick as best model ever), VEO3 with native audio stun everyone, and Thinking Machines Lab (Mira Murti + an avalanche of top researchers) launch. Daily.co's smart turn detection shipped during this period.

  • ChatGPT memory and GPT-4o native image generation
  • VEO3: native audio generation β€” crossed the uncanny valley for video
  • Claude Opus 4: Ryan's pick as best model ever after 700 days of daily AI coding
  • Thinking Machines Lab: Mira Murti + top OpenAI researchers form new lab
  • Kwindla (Daily.co) ships smart turn detection for voice agents
  • Claude Max 24/7 agent β€” briefly available, then nerfed, spoiled everyone
  • Google IO 2025: the quarter voice agents escaped the AI bubble
Kwindla H Kramer
Kwindla H Kramer
"It felt like it was the quarter where people got really excited about voice agents outside the bubble of people who are building them."
Nisten Tahiraj
Nisten Tahiraj
"For me, this was the start of the 24/7 AI agents era because of the Claude Max plan before they nerfed it multiple times."
LDJ
LDJ
"Around June where Thinking Machines Labs, the news broke that they're having their first billion dollar round. Mira Murti and just like an absolute avalanche of top tier researchers."

πŸ‡¨πŸ‡³ Q3: July β€” Chinese Labs Dominate, AI Browsers Emerge

July was peak Chinese lab dominance: Kimi K2 got serious recognition, Qwen 3 Coder posted insane scores, GLM 4.5 ran on Cerebras fast enough to win hackathons, and Tencent HO One entered the scene. The first serious AI-native browsers started shipping. NVIDIA's 'fridge company making AI' joke from years prior delivered actual frontier research.

  • Kimi K2: Chinese model that earned mainstream recognition
  • Qwen 3 Coder: insane benchmark scores for the coding crown
  • GLM 4.5: ran on Cerebras fast enough to win competitive hackathons
  • Tencent HO One and Huawei enter the open weights race
  • First serious AI-native browsers start shipping
  • NVIDIA's AI research division delivers frontier-level results
Nisten Tahiraj
Nisten Tahiraj
"GLM 4.5 also came out in July. And that was like the first one that could run on Cerebras β€” people at competitions using GLM just running on Cerebras and winning hackathons."
Alex Volkov
Alex Volkov
"We made fun of the fridge company making AI for so long. This was like a running joke. Yeah. But they have researchers doing AI research and developing things, open source."

πŸ–ΌοΈ Q3: August β€” Flux 3, Agent Standards, and Three Years of AI

August marked three years since Stable Diffusion went public. Wolfram reflects on the distance traveled. Flux 3 dropped and immediately became the image generation gold standard. Agent-to-agent communication standards started consolidating. KAIP brought multi-agent Claude Code orchestration. The agent infrastructure layer was quietly solidifying.

  • Flux 3: new gold standard for image generation
  • Three years post-Stable Diffusion: Wolfram reflects on the distance traveled
  • KAIP: orchestrate multiple Claude Code agents in parallel
  • Agent-to-agent standards: A2A and related protocols coalescing
  • August: surprisingly dense with infrastructure-layer releases
Wolfram Ravenwolf
Wolfram Ravenwolf
"Let me cover August, 2025. And to put it in perspective, that has been three years after I got into AI with a stable diffusion moment."

πŸ’» Q3: September β€” GPT-5 Codex, Vision Breakthroughs, DeepSeek V3.1

September was 'infinite money glitch' month. GPT-5 Codex dropped as OpenAI's specialized coding model and caused the stock to move significantly. DeepSeek V3.1 Terminus resurfaced just as the team was barely keeping up weekly. Vision and video saw major breakthroughs. RevA emerged as a four-in-one image creation and editing platform that Alex still uses daily.

  • GPT-5 Codex: OpenAI's specialized coding model β€” 'infinite money glitch' on stock price
  • DeepSeek V3.1 Terminus: dropped just as everyone was overwhelmed
  • RevA: 4-in-1 image creation/editing platform
  • September vision and video: major model releases
  • The pace became almost too much β€” Nisten missed a week and fell behind
Yam Peleg
Yam Peleg
"We got GPT five Codex, which is OpenAI's specific fine-tuned model. The thing is why it's infinite money glitch, because the stock price moved quite a lot."
Nisten Tahiraj
Nisten Tahiraj
"This is where it just got too much and we were barely keeping up every week. I think I missed one week and I was completely lost."

πŸŽ₯ Q4: October β€” Sora 2, GLM 4.6, Claude Skills

October opened Q4 with Sora 2 democratizing video generation and spawning an endless wave of memes. GLM 4.6 quietly became what many businesses still use today. Claude Skills launched β€” largely missed at release but now picking up fast, with Nisten calling it 'MCP-level if not bigger.' Cursor 2 and Composer shipped. Cognition's SWE-bench agents began showing that labs were training models specifically on agentic coding benchmarks.

  • Sora 2: video generation democratized, memes still circulating
  • GLM 4.6: quietly became a go-to for many businesses
  • Claude Skills: missed at launch, now gaining steam β€” Nisten says 'MCP-level or bigger'
  • Cursor 2 + Composer: IDE agents level up
  • Cognition SWE-bench: labs begin training specifically for agentic coding
Nisten Tahiraj
Nisten Tahiraj
"We got GLM 4.6, and this is picking up a lot. A lot of businesses still do use this one."
Alex Volkov
Alex Volkov
"Skills is like MCP level if not bigger, as far as Claude users go."

⚑ Q4: November β€” Gemini 3 Reclaims #1, GPT 5.1, Grok 4.1, and Banger Week

November delivered one of the most stacked weeks of the year: GPT 5.1, Grok 4.1, and Claude Opus 4.5 all dropped within a week and a half. Gemini 3 Pro Deep Think mode reclaimed #1 on ARC-AGI 2. ElevenLabs Script V2 Real-Time shipped. MiniMax Hailuo (LLM 2.3) dropped. Windsurf released Code Maps. Daily.co's personal benchmarks hit saturation. The acceleration was undeniable.

  • Gemini 3 Pro: reclaims #1 on ARC-AGI 2 with Deep Think mode
  • GPT 5.1 + Grok 4.1 + Claude Opus 4.5: banger week β€” one and a half weeks of top releases
  • ElevenLabs Script V2 Real-Time: voice synthesis milestone
  • MiniMax Hailuo LLM 2.3: another strong Chinese open release
  • Windsurf Code Maps: generate flowcharts of entire codebases
  • Kwindla: 'my personal benchmarks got saturated β€” that was never true before'
Yam Peleg
Yam Peleg
"It was like a week and a half, two weeks that we got GPT 5.1, Grok 4.1 and fast. Claude Opus 4.5."
Kwindla H Kramer
Kwindla H Kramer
"This quarter my personal benchmarks got saturated, which is super exciting and that was never true before. Like the tasks I use to tell whether a model is good or bad β€” I can't distinguish anymore."

🏁 Q4: December β€” Google's Incredible Month & Year Wrap-up

December was Google's month: Gemini 3 Flash, big realtime model updates, a Gemini TTS model, and a cascade of releases that cemented Google's comeback narrative. Kwindla connects Gradium and KyutAI (same founding team). The crew reflects on what 2025 meant β€” from LLMs in space to AGI benchmarks saturating β€” and sends everyone off for the holidays with a 4.9-star Apple Podcasts rating intact.

  • Google December: Gemini 3 Flash, realtime model updates, TTS model
  • KyutAI ↔ Gradium connection: same founding team (revealed by Kwindla)
  • LLMs trained in space β€” the year's wildest headline
  • ThursdAI ends 2025 with 4.9-star Apple Podcasts rating
  • 52 episodes, 12 months, relentless acceleration documented
Alex Volkov
Alex Volkov
"Google is absolutely crushing it. They have the TPUs, they have the product surface β€” my Gmail started writing replies for me where I can't tell it's AI anymore."
Nisten Tahiraj
Nisten Tahiraj
"And we finished the year off still 4.9 star rated on Apple Podcasts."

πŸ† The Big Picture β€” 2025: The Year AI Agents Became Real

Looking back at 51 episodes and 12 months of relentless AI progress, several mega-themes emerged:

  1. 🧠 Reasoning Models Changed Everything β€” From DeepSeek R1 in January to GPT-5.2 in December, reasoning became the defining capability. Models now think for hours, call tools mid-thought, and score perfect on math olympiads.
  2. πŸ€– 2025 Was Actually the Year of Agents β€” We said it in January, and it came true. Claude Code launched the CLI revolution, MCP became the universal protocol, and by December we had ChatGPT Apps, Atlas browser, and AgentKit.
  3. πŸ‡¨πŸ‡³ Chinese Labs Dominated Open Source β€” DeepSeek, Qwen, MiniMax, Kimi, ByteDance β€” despite chip restrictions, Chinese labs released the best open weights models all year. Qwen 3, Kimi K2, DeepSeek V3.2 were defining releases.
  4. 🎬 We Crossed the Uncanny Valley β€” VEO3's native audio, Suno V5's indistinguishable music, Sora 2's social platform β€” 2025 was the year AI-generated media became indistinguishable from human-created content.
  5. πŸ’° The Investment Scale Became Absurd β€” $500B Stargate, $1.4T compute obligations, $183B valuations, $100–300M researcher packages, LLMs training in space. The numbers stopped making sense.
  6. πŸ† Google Made a Comeback β€” After years of "catching up," Google delivered Gemini 3, Antigravity, Nano Banana Pro, VEO3, and took the #1 spot (briefly). Don't bet against Google.
Alex Volkov
Alex Volkov 0:00
Hello
0:00
and welcome to ThursdAI, the yearly recap edition of 2025. My name is Alex Volkov. I'm an AI Evangelist with Weights, & Biases, and as you can see, and we're very, festive mood right now. and I would love to welcome my cohost as well. If you are tuning in, no matter where you are to our live show. welcome. We're gonna have a great show today, with a lot of updates. And first of all, I'm gonna welcome the cohost here. Welcome Wolfram, welcome Yam, and we have LDJ. this is a very festive stream of us. this is our last regularly scheduled stream of the year and what a year it has been. And we are here to take you, through a journey, of the insane amount of releases that happened this year. Some of them will probably make you go, oh, it's been only a year since then. and so I'm very excited to kick off the show. This is our 52nd show of this year, either 51st or 52nd. Ah, which means it's been a full year of ThursdAI yet again, and I'm very excited to talk about the recap of 2025 with this. I'll just say hello to our co-host, Wolfram, Raven Wolf, WhatsApp, man, you, I see you have a, Christmasy hat as well. I appreciate, as you guys see my background, there's a, there's Christmas in the air, Wolfram. How are you doing, man?
Wolfram Ravenwolf
Wolfram Ravenwolf 1:20
Ah, amazing.
1:21
What a year. What a year. And, I have seen what you prepared for the shows. So for everybody tuning in, they says, must. This must have been so much work that Alex did, man. Definitely check this out.
Alex Volkov
Alex Volkov 1:33
Yeah, definitely works.
1:34
Staying, staying tuned with us, the regular setup of the show will be again, we will cover the first, in the first section. We'll cover the week. we'll do it briefly. We're not gonna go into, into a whole detail, discussion about each topic, although there's a lot to discuss about even this week. and then we'll continue into, the yearly recap where we'll take turns and talk about what happened this whole year, with this, Yam Peleg. What's up, man? How are you doing?
Yam Peleg
Yam Peleg 2:03
I'm doing well.
2:04
I'm doing well. I never sleep, we never sleep. how it works at the moment, and yeah, it was a hell of a year. Like for real. just going through the notes of what, of what we're gonna cover. you start to realize all of this happened this year, and it's not, it's just like things that you got used to, you were like couple of months old. You, you start to realize that. And, yeah, we're, accelerating really hard.
Alex Volkov
Alex Volkov 2:28
That's great.
2:28
The year we felt the acceleration. LDJ, how are you doing, man?
LDJ
LDJ 2:32
Yep.
2:33
Everything's going good. It's been an amazing year and great to, for us to still be doing this whole podcast thing for, what's the, about over two years now, right?
Alex Volkov
Alex Volkov 2:42
Over two and a half years.
2:43
Yeah. We're coming up in three in March. Yeah. Yeah.
LDJ
LDJ 2:45
But yeah, amazing progress and, hopefully 2026 we'll have
2:49
even more exciting updates in ai.
Alex Volkov
Alex Volkov 2:51
Alright, joining us, a friend of the show for the yearly
2:54
recap, Kwindla Halman Kramer, CEO of daily, and, maintainer of Pipe cat, this incredible library for AI voice. it's been also a heck of a year in Voice Agent as well. Kwin, welcome to the show. Welcome back, man. It's always great to see you. How are you doing? I'm good.
Kwindla H Kramer
Kwindla H Kramer 3:10
Congratulations
Alex Volkov
Alex Volkov 3:11
on
Kwindla H Kramer
Kwindla H Kramer 3:11
another year.
Alex Volkov
Alex Volkov 3:12
Thank you.
3:13
It's been quite an incredible year. and so I'm very happy that you're here with us, coming through loud and clear as well, folks, definitely give Qwen a follow if you're interested in any voice news. Okay. So every week we do one thing in AI that must be named at the beginning of the show. I'm tempted to start to ask you one, one thing for the year. the most important thing. It's gonna be very difficult, why not? Why not? This is a celebratory show. What was. The one most important release of AI this year for you?
Yam Peleg
Yam Peleg 3:41
Oh, it's not hard to guess.
3:42
I think that the major thing this year are in general CLI agents, specifically Claude Code starting the thing, with a huge explosion, leading to the huge explosion we see at the moment. And, pretty much the year of the agents as everyone, was talking about started because of this. And, I think that's huge. Seriously, it's a, it's just a huge paradigm shift in, in the entire thing.
Alex Volkov
Alex Volkov 4:11
I knew it a Claude Code.
4:12
it was clear. It is day to me. but yes, I think it's definitely a very important release of this year. Let's go to Qwen. Something that, you know, the release that surprised you this year was the most important of 2025. I,
Kwindla H Kramer
Kwindla H Kramer 4:24
Claude code, just proving yeah, the models really matter,
4:27
but sometimes it's about the harness and maybe mostly it's about the harness. That's for me, that's the transformative thing. like Yam said, year the agents actually came. True. We were all like 2025 is gonna be the year the agents and it was actually true. Yeah.
Alex Volkov
Alex Volkov 4:41
The year of the coding agents, the year of the voice
4:43
agents, the year of the maybe use browser use tool agents as well. alright, will say hello to Ryan Carson. Ryan, welcome to the show.
Ryan Carson
Ryan Carson 4:53
Good morning everybody.
Alex Volkov
Alex Volkov 4:55
We'll get to you, we'll get to you in just a second.
4:56
prepare your most important AI release of the year, please. I will. What about you?
Wolfram Ravenwolf
Wolfram Ravenwolf 5:02
I think a big paradigm shift was happening in image.
5:05
And it may not be the most upon release, but it was definitely one of the most significant ones for a lot of people, which is Nana Banana and Nana Banana Pro, where you had more control than you ever had about the images. this was a major leap. So I think that has, for a lot of people, a big impact.
Alex Volkov
Alex Volkov 5:23
Yep.
5:24
I absolutely agree. I agree with all of you by the way. All right, Ryan, one thing this year that absolutely blew your mind. The most important thing, that happened in the world of ai?
Ryan Carson
Ryan Carson 5:34
absolutely.
5:34
Opus four, five. it is unbelievable. I've been coding with an LLM for over 700 days now, like every day, and it is significantly different. you can ship a full feature on a mature code base in one day, always. It's just mind blowing.
Alex Volkov
Alex Volkov 5:54
It's so good.
5:55
It's so good. cloud Opus is absolutely, just remarkable in a unexplainable way. I agree with you. This wasn't mine though, this year. Nana Pro was incredible. It wasn't mine this year, I don't think, although it's, I think it's very close. mine would be the agent browsers, era. This was also a year of agents in browser use. And I think it's absolutely incredible, to just sit back and watch and at first, like with an autonomous car, be wowed at yo this clicks things, this drag things, oh my God, this clicks the buttons. And now they're going off on 30 minutes doing tasks, and then the 31st minute they get stuck and you're like. Come on, bro. I could have done it without you. so it's very interesting and I think, we're just gonna see more and more of these agents, going, browsing agents, is it for me? generally a year of agents I think feels like a cross overall. and then they are getting better and better because Opus and other stuff. Now it's time to move to, re yearly recap, a year of, of AI's updates. This is, it's been one heck of a year. We're gonna take turns and actually start from, from, Q1 and go all the way up. we're about to try to recap 2025, in the AI releases, since the first, month of January on ThursdAI we have been covering every release in a weekly episode since, a while ago for this year, we had a, pretty much four shows every month. And so we're gonna try and walk you through the most important releases from the shows. this, I will open this up with the, with the first one. Okay? so we're focus on Q1. Q1 was absolutely insane. and I think the most important thing about Q1 in January specifically was deeps seek R one folks, you remember deeps seek well, back then the main models already started, like the main foundation lab started like hiding the reasoning and tracing, and deeps seek R one was like fully open. This was the first reasoning model in open source after the QWQ attempts, from Qwen. And this absolutely shattered the surface area of AI geeks thinking about AI models in the privacy of their home because everybody was talking about deep sea Car one, your mama, your grandmama, everybody. Because well, when Deep Sea Car One came out, it was so performant, at least on the benchmarks it looked like it's beating the foundational models. And then there was a leak or something about the price to train this model, I think around $5.5 million. It wasn't clear if it's true or not, but definitely this caused such a down trend on the NASDAQ for Nvidia that it caused a 560 billion, dollars in losses for Nvidia for a while actually top my NVIDIA investment back then. 'cause ah, this is stupid and we knew this is stupid. Ah, and, the quote from me back then, and I'm really happy that I have quotes here. Open source AI has never been as hot as this quarter. it's only the beginning. hold on to your buts because we're just starting here with Dipsy. it was 50 x cheaper than oh one back then. Oh one, from OpenAI, remember O one, so long ago. it's just absolutely incredible. other notable releases, from January Open the eye gave us operator. This was the first agent Chat GPT Open the eye was finally able to click buttons, et cetera. operator was sitting somewhere locked in the Chat. GPT I think it was only access to pro accounts in the beginning. And then, you could supposedly you could book restaurants and order groceries, and I think we tried and it worked. but, it's, it was based on a new computer use action or agent Model Q model. and it still failed captures. Also, there was an announcement from Sam Alman and, SoftBank and Oracle. They all stood in the office of the president and announced a target projects target a half a trillion dollar investment, around 2% of E-S-G-D-P at the Manhattan Projects of ai. Since then, I think they tripled on some of the investments, back then, and also in VDCS. they showed a, desktop that runs 200 billion models, DGX Spark. this is from the infographic. let me pull up the actual, Let's pull up the actual review of this month and then see if we missed anything.
Wolfram Ravenwolf
Wolfram Ravenwolf 10:06
Quick question, because what would be interesting is,
10:09
what of these news are relevant today for some of you, or at least one of you? So is anyone using operator? For instance?
Alex Volkov
Alex Volkov 10:17
I think what happened to operator is it turned into, into agent
10:22
inside IGBT, and then it turns into Atlas. So operator is basically the grand granddaddy of Atlas. If we look at this computer use model, all of what they learned, so folks will use Atlas. probably, it's, it has to do with that.
Nisten Tahiraj
Nisten Tahiraj 10:35
Oh, just really quickly came me 1.5 also came out
10:38
in January, but, that was just,
LDJ
LDJ 10:41
yeah.
10:41
Oh, that's true. Yeah. Actually I remember it came out within 24 hours of deeps seek, I think it was the day of, or like a few hours before even. And then I recalled they actually were even more open about their methods and that paper. But deeps seek overshadowed everything. Yeah.
Alex Volkov
Alex Volkov 10:57
Yep.
10:58
Deeps sea. Absolutely overshadowed everything. Yeah. I'm gonna let you take over February. I think, you, you really want to, we go ahead.
Yam Peleg
Yam Peleg 11:05
yeah, let's go.
11:06
Alright. So February, we started with the deep research, war, you can say. today, everyone got a deep research, but we only got it on February. You can say that this is the first agent that was widespread released to the public. Agent ish that is, really researching, the web for whatever you want and gets you built step by step one giant, report at the end. At the same, quite around the same time, we got Claude 3.7 sonnet, which for many people, including myself, was a step change from what we could do with the coding models to this moment. it was just mind blowing for nearly anyone that used it to write any code whatsoever. most importantly, it could do, front end and design much better than any other model at the time. It was clearly mind blowing, to watch. At the same time, we got, we got an answer from an OpenAI, in the shape of, GPT 4.5, which is, is an odd model, toothy moment, rumor to be, much larger than all other models released by OpenAI. But, also, not a successful release by them, even telegraphed as not a successful release, not something that we should get too excited about. but to this moment, the opinions about GPT 4.5 are, are all over the place. You can say, GR three was released by, XAI on Jan back on January, back on February, sorry. And, At the same time, we don't have it on the, we don't have it on the on info infographic, but Claude Code was released on February, marking the start of the year of the agents, the cl start of the CLI tools, explosion that we went through throughout the entire year. It's,
Nisten Tahiraj
Nisten Tahiraj 13:06
we need a gong for that one.
Yam Peleg
Yam Peleg 13:08
Oh yeah.
13:09
Oh
Alex Volkov
Alex Volkov 13:10
yeah.
13:10
We don't do gongs here. We do air horns. So again, I wanna like stress, this was, at the beginning of the show, we took, a summary of the most important thing in the eye for this year. This was yum. To this is like most important thing was recent February the beginning of, the agent coding CLI revolution with Claude Code. I think we already had, CLI tools back then, but the Claude Code was absolutely just like a smashing success. AndAnthropic to just, to their credit, have been like knocking it out the park with these side releases because MCP was just released what, like four months before Claude Code also as an internal project within onAnthropic. And then also around that time we started we're gonna get to MCP in a second. and then this is like also a side project fromAnthropic that like brought billions of dollars of revenue to onAnthropic because of just this one thing. Because of Hey, let's take this intelligence, put this in this format for many people, most of us here use cloth code if not on the daily, a weekly basis. Just like it's that good. And when it came out, we talked about this just briefly because it wasn't that clear that it's gonna be that good and it's unclear to me now, question, they all turned to, that will turn to, Ryan Carson actually on the show. Why is Claude Coate specifically so successful and good compared to. Many other offerings out there. Why do people, get excited about Cloud Cold? We'll get to Ryan and then Yum. And then we'll continue with the recap.
Ryan Carson
Ryan Carson 14:29
I think it's because it was used internally.
14:32
So Boris Cheney, built it for an internal tool inside Anthropic. So they were dogfooding it heavily. and when you dog food your own tool, it gets really good. and so it came out, it was a clear contender, and a lot of people love Claude Code. obviously I'm on the AMP team and I love, amp. and, we work really hard, on what we do as well. But a lot of people love Claude Code. A lot of people wanna use the two E. They don't want a whole IDE, they don't want all the cruft. I'm the same. I'm a hundred percent in the two E or CLI.
Alex Volkov
Alex Volkov 15:02
Yeah.
15:05
Yam you are like raising your hand and clapping. Oh yeah. And getting excited. why this like the number one release for you? First,
Yam Peleg
Yam Peleg 15:11
absolutely agree with Ryan.
15:13
they, they studied many times that they used this. They started this, as a tool for kind of, let's see what's gonna happen kind of way. And it exploded inside Anthropic. You can find interviews, with Boris explaining exactly how it starts. Yeah. Claude Code is just, is just exactly in the right stack, right around the right location in the stack, in my opinion. We had, we had agents, we had cursor, at this point we had multiple agents. And a month before we had operator, which is an agent that controlled the computer. However. Claude Code is just control your computer through text, through the terminal. You can do anything you want with a computer through the terminal E even as a human, you can just do it if you're good with the terminal. And this is a completely different paradigm when you think about it because you can connect your UI facing software to an engine running the terminal that can control the entire computer. So you basically have an engine with an AI that connects to the ui, different UIs that you're using. And it's completely different than what we had before, which is, like cursor, you have UI and embedded inside of it, the agent as well. Anyway, it was, I can, I think I can read at this point that I do have extensive knowledge about the, in the first versions of Claude Code. You can say it this way. very extensive. And it was rewritten multiple times just to, to, just to, to re to. It was rewritten from scratch multiple times. You can see it in the code and it's a thin wrapper. I searched for the secret for that. That is running it for quite a while. at the beginning. there is no secret. The secret is that it's just a thin wrapper done very well. It stop
Alex Volkov
Alex Volkov 17:04
engineering.
17:04
And it's very great system prompt. That's pretty much
Yam Peleg
Yam Peleg 17:06
absolutely small details, making it run smoothly.
17:10
But there is no big magic. It's just small, tiny details that making it run very smoothly that they discovered by using it themselves. And it's thin showing you what it's doing and can control everything about your computer. This is why it's popular, because it's just good. That's the entire thing, pretty much.
Alex Volkov
Alex Volkov 17:29
Let's, continue to the rest of, so Cloud co definitely
17:33
marks the main release there as well. Let's continue to the rest of, of, February Yame, if you wanna continue covering this.
Yam Peleg
Yam Peleg 17:41
Sure.
17:43
I, what didn't they cover? I think I,
Alex Volkov
Alex Volkov 17:45
the OpenAI roadmap, I wanted to like highlight because, we're gonna get
17:48
to chat about some other OpenAI releases. And I think back then it was clear that OpenAI exchanging tune, there was a whole thing about O one and O2. They skipped O2 because the O2 arena, if you guys remember, they jumped into all three and then they mentioned, yeah,
Yam Peleg
Yam Peleg 18:01
look at the time, the naming was, the naming is still,
18:05
it's still all over the place, but we basically had two, two lines of models. We had a GPT line, GPT four, 4.5, two months later we get GPT 4.1 and we have the O series, which are the reasoning models. We had O one, then we got O three and O one Pro, and we are about to get O oh four mini and so on. And we weren't sure exactly where, this whole thing is going. And we got a release, of a roadmap from, I'm not sure if from OpenAI or from, a third party that OpenAI is preparing for one model to rule them all. Basically a spoiler spoiling, GPT five that came couple of months later on, this year. And, anything else you wanna highlight? Alex?
Alex Volkov
Alex Volkov 18:54
Okay.
18:55
folks, it's only February, in the beginning of February when Andre Cap, Andre Capi coined the term vibe coding in February, like this, I think, coincides very amazingly well with the release of, Claude Code just a little bit afterwards, under famously posted a tweet. I think it's 5 million views till since then. Vibe coding basically refers to the fact that, you no longer use software engineering. it then grew and became a term for everything AI related coding. And then people said, no vibe coding is really bad. But generally, what, under Co back then meant is that for side project, for offshoot projects, he basically doesn't read the code anymore. He just talks to an ai, agent and then hits send and send. It keeps going and just vibing with the thing, that it became like absolutely viral this term. and so I wanted to call out that it's only been less than a year of vibe coding and it's now everywhere. Everybody's white coding. folks, anything else you have to notice? Anything else important from, February that we missed? Any like big release that, that's not on here?
Nisten Tahiraj
Nisten Tahiraj 19:51
Yeah.
19:51
GR three and also the Alibaba, wx Yx Yeah. Model and the Step Fund video. Those were, people use those a lot.
Alex Volkov
Alex Volkov 20:03
Yeah.
20:03
So Alibaba's one, X, no, it switched to one. Just WANI think since then. they have, they've been great at, video model generation, although for this year, this is not the big video break, but yeah. And since you spoke up, how about you cover, the third. the third, month in Q1 on March, 2025. for us, for the listeners of the show, by the way, March episode, the birthday episode of, of both GPT. 'cause GPT four came out on in March, of last year. and, and for Thursday, I, this was the biggest episode that we had this year. This was the absolutely the best one with the one with the MCP stuff as well. Nisten. all yours, please take it away for March, 2025. Yeah,
Nisten Tahiraj
Nisten Tahiraj 20:45
so March, we got the, let's just start with
20:50
the important one, MCP one as a standard model context protocol. And, they tried to make another one and they sponsored a bunch of hackathons and stuff, but it wasn't happening because yeah, once OpenAI adopted to Anthropics MCP, that was over. And, we got Gemini 2.5 Pro. so we got the first, trillion parameter model from Google that people could code with. I personally didn't like it, but a lot of people liked it. And, it had the entire thinking in it as well. we got on the open source side, we got, deep seat V three. the update. This one had extremely high scores, but it was also, I tested on the medical side. It was still, so a non-thinking model was one of the best open source, medical models for that. but yeah, we got GT four Oh, mini TTS and yeah, and we got the, yeah, we got the transcribed and, we started getting, the image from, from Gemini too. So it was Gemini flash image generation. it wasn't that good. And, yeah, so we started seeing, yeah, on the open source side, we saw a whole bunch of stuff. So we got Gemma three. We got, Q QQ from, from Qwen, which was, that one was pretty good. we ended up with Qwen 2.5, OMI seven B, which I think some people still use today. So that's, that's worth noting. recka Flash three was also pretty good. And video started out their entire Nemotron thing, and they're getting a lot better now. And I think we got the largest open source model from, from Hir as well. And, and most importantly, I think we got the biggest fully open data, open weights, everything from, from Almo two, actually, I think that was a bit later. Think Almo two might've been fine tuned, but the one that is now is still the biggest, other things. yeah. Yeah, MCP started just getting adopted everywhere. yeah. And, with some biases, implemented too for the Mt p server. but some very important things from the audio side, because these are things that I use, I know a lot of people use these, MLX audio. Shout out to, prince Kaa because he did a terrific job on that. Especially with the coro models that you could use, everyone could run that on any, even five-year-old Mac. and I think that is still in use today. And, we got a robo flow debtor, which people are still using, are still using today. And we got the benchmark. I don't think a whole lot of people use the benchmark, but, these two, robo flow and ML X audio, I think these two were like, I would say like some of the most used ones that are still very good tools that they, that you can use today. So I think those are like actually extremely impactful. we got a whole bunch of, image ones, so we started getting the Ian, third models, the Ian VAE the bearable encoder is actually still very good and used today. And, we got ideogram and re image, but I don't know a lot of people that use those. So yeah, that's, that is the march. yeah,
Alex Volkov
Alex Volkov 24:08
it was a very big month as well.
24:09
Qwen would you, join us to discuss, the voice AI implications of March? Because I think GG GPT TTS was like a very, impactful one for sure.
Kwindla H Kramer
Kwindla H Kramer 24:19
Yeah.
24:19
at the end of March, OpenAI released two models that were derived from the same work as the GPT realtime speech to speech model, but separated out the transcription, audio input, understanding side of that model and the. Voice generation output side of that model into separate models you could use through the APIs. I think a couple important things about that. One is these were truly LLMs, they're steerable models. And before that, most of the dedicated transcription and voice models we all had access to were really focused on just taking a little bit of text input and literally outputting voice or taking a little bit audio input and literally outputting transcription. But you can prompt these two models. Like you can prompt an LLM, and then whatever you give them, you've given them a system instruction. They'll do something like translate on the fly or they'll modify, what they're giving you or you're steering them to understand words better. really interesting future direction. The other thing is this was a recognition from a foundation lab that we're gonna live in like a multi-model world for these agents. We're constructing. It's not always gonna be just a big huge speech toe model. And we have that speech model do everything for us. We're gluing together multiple models because we have a ton of different kinds of things we're doing. And having these individual models available is really powerful. you think of something like how AMP architecture for the coding agents uses lots of models. I think everything we're doing is multi-model. I think that's the way the world's gonna be. and this was OpenAI breaking apart their big speech tope model and giving us the components.
Alex Volkov
Alex Volkov 25:47
Yep.
25:48
and we'll cover the updates to that one as well. And I think they were like one of the first ones, Qwen super quick, recap on the semantic VAD that they launched. I think they were the first one to actually productionize, semantic D and we talked about this back on the show as well.
Kwindla H Kramer
Kwindla H Kramer 25:59
Oh yeah.
25:59
Inside, inside a speech to speech model. So they launched a really nice, component of that speech to speech model that could tell when a user is done talking and then. And the voice agent should start talking. This is one of the big challenges in building voice agents that feel natural. How does the voice agent, wait long enough, but not too long? So that doesn't interrupt you? It feels like it responds very quickly. humans expect we talk over each other. Even we sometimes have like negative voice to voice latency in a natural human conversation, or we respond very quickly, or we give like a little back channel thing that, we're listening and we're still missing in these speech to speech models. Like that level of naturalness, that level of give and take response, super quick response. So having a really good semantic meaning it understands the content. And vad meaning voice activity detection model in the GPT realtime speech, speech API was a real step forward. all of the sort of transcription models are now building VAD in like semantic turn, taking into their transcription models. there's open source models to do it. I work on an open source, open data, open training code, semantic VAD model called Smart Turn. There's lots of interesting research here, but this is gonna be a 2026 thing we're all continuing to work on.
Alex Volkov
Alex Volkov 27:09
Yeah, a hundred percent.
27:10
And so this concludes, I think this concludes folks, super quick reactions on the whole quarter. I know, folks in the audience saying the infographic is small, I apologize. This is what we're, working with. There's a lot of detail here. any else, any major theme of the first quarter that we missed? as, as far as the kind of the theme outline here, January was the earthquake for ai and then February arms race deep research was dope. Absolutely. GR three. And then March, Gemini 2.5 Pro became like number one, for a very brief period of time. Maybe a little bit, yam you mentioned, 4.5 as well as a release that kind of wasn't very clear, in February. But I think like a very big model from Open the Eye. I think. To call out. The thing that I remember from then is that we, at some point on the show had to explain why a 4.1 model is better than 4.5 model. And the versioning and the just phrasing just didn't make sense to any of us. We're sitting here Hey, I had a conversation with a person explaining to them the GPT 4.1 is better than GPT 4.5, and this makes no sense in any other place in the world.
Yam Peleg
Yam Peleg 28:11
And the difference between 4.0 and oh four
Alex Volkov
Alex Volkov 28:14
oh, it's, yeah,
Wolfram Ravenwolf
Wolfram Ravenwolf 28:17
two, that was brutal.
Nisten Tahiraj
Nisten Tahiraj 28:19
Oh yeah.
28:19
I want to say quickly that Cursor sales exploded during this time, because of, sauna 3.7 And everybody was vibe coding because it could actually make web apps pretty well. this was like, this blew up in, in corporate, and stuff.
Alex Volkov
Alex Volkov 28:36
I wonder how much carpathy and vibe coding is a concept like
28:39
had to do with the cursor explosions, but yes, let's call out Cursor. I don't know at which point Windsurf came to the stage, but also Windsurf, the agent editors because of Sonet and then the Gemini 2.5 Pro, but mainly Sonet 3.7, I think, was a huge theme of this, of this quarter. So absolutely the quarter that broke the market and then changed a bunch of stuff and opened up a bunch of New Voice, NTTS huge Q1 for AI in the world.
Wolfram Ravenwolf
Wolfram Ravenwolf 29:03
And,
29:04
I think that it was also the pivotal moment in history where we noticed that OpenAI is not, the big leader all the time when, deep Sea happened. So we saw that OpenAI with a space, so open source ai, open West ai, that it can actually get very close. That was a moment where people realized, okay, yeah, it's not that big of a difference between the models. Proper AI was still leading, but it was now much closer than it used to be.
Alex Volkov
Alex Volkov 29:34
I wanna also shout out super quick, like we mentioned this a little
29:38
bit, but like theGPT native images was a huge thing specifically because you could talk to an image model, you can talk to it and would reason understand you have a multi-term conversation with it. This was like the huge big thing. You didn't have to go to a dedicated website. There's a bunch of stuff, there's like a bunch of, flux I think was already there. Obviously stable diffusion stuff before we could do Ghibli stuff. We talked about this on the show. We could still, we could, I could send you to a model with the Finetune and cv, blah, blah, blah, and we can do Ghibli stuff. But for many people the fusion unlock was, Hey, I can chat with this thing. I can tell you what I want and it gives me what I want back. And I can tell it, Hey, no, I actually want this. And that, the conversational back and forth where the model is multimodal, it receives back the output of its own and then can iterate on this. This was like the big unlock for many people. And, for some reason, and I think because of copyright reasons, we'll talk about this one, Sora two comes up in, in Q of three of this year as well, or copyright was a big issue as well. Ghibli Studios obviously didn't really, were very happy that anybody can create their style now, but all of us were Gib fight. I think our thumbnail for the show for that was like, also all of us in Gib style as well, if you guys remember, it was a huge thing. It was like a world shattering, like world shattering thing. We often talk about the kind of the surface area bubble that we are in the AI bubble sitting like, deep seek. We knew about this way before R one and then R one broke the bubble. Gify absolutely broke the bubble. OpenAI recently released a video where they need more GPUs, they're about to fundraise, whatever. And the one thing Greg Brockman talk about there in terms of we need more GPUs, is that when the Ghibli thing happened, they had to take GPUs back from research into production. and he said, we're sacrificed the future from the, for the present because they literally just had so much people, so many people just doing, things. it's very interesting for the later, of this year. Alrighty, let's continue. Q2. Yeah, Q2 I think is also, man, this whole year is crazy, but Q2 is like also very meaningful. Wolf, you're gonna take this one? Yeah, sure. You gonna start with,
Wolfram Ravenwolf
Wolfram Ravenwolf 31:36
April, back in April.
31:38
That was a month with ups and downs with some highs and lows and, I have been pretty critical of recently. You remember, but. Oh three and oh four mini released in April, on April 17th. And, yeah, those were amazing moments because, these reasoning models, they had, the ability to use tool calling during reasoning like web search, python image generation, and, chain hundreds of consecutive tool chords to do really complex stuff and even manipulate images with thought like dropping, zooming, rotating. And, yeah, it's quite amazing in events. And I can tell a personal anecdote because all three has been one of my favorite models of all time. For one reason in that month, my AI workstation broke and I used all three to fix it. It, It was really taking me through it multimodality reasoning. It was amazing. One of the moments where I really realized how powerful AI is in. Daily use cases where you can use it. so that was a really big moment for me and big kudos to op may I, for doing this. And, uh,GPT 4.1 was also released with a million contacts, which was pretty new for op may I. And, they had the 4.1, 4.1 mini and nano versions, deprecated 4.5, which was much bigger and probably not as, efficient for them to run. it was amazing throughout the whole context. Achieved, 72% on video. E So Multimodality was strong with it. And, yeah, it was a great model that were the good parts. OpenAI really big, big, month for OpenAI in my book. But Meta totally, fumbled with, the LAMA for release. So they were the big ones for, open source, open Web ai. Even the, our local LAMA subreddit where we all ha hang out, about open stuff was named after lama. And now with the LAMA four release, they, they really lost it because, for once there were big models. Now the Maverick was 400 billion total parameters. And the Behemoth that never got released, it was 288 billion parameters. Active of two, trillion total parameters. that was a moment where they, yeah, went MOE, like the deep seek. Before, though a lot of people speculated that it was a total, we have to redo this to get better here. And, there was also some trauma when the tested models in LM Arena was not the one they released, so Oh, Yeah. There was a lot of, strange things happening here. We never got the big model. There was a controversy and the model themselves, they weren't leading, which, Lama two, and three before they were the big open source releases. So now. Basically, the models took over, or meta made it possible for that to happen. in other news, Gemini 2.5 flash came out, which we were all using until Gemini. three Flash came out now. So it was still the 2.5 flash after the three pro release, which was a counter to all three or four mini in that it was able to, yeah, you could set a thinking budget. How many tokens per API call it should think. it has a 1 million context token, a token context window, which was always a big strength of Google that they had the huge context. and it was super cheap. So the, old flash model, it cost 15 cent per million, input tokens and 60 cent per output million tokens. So fast model wasn't the smartest, of course, but it, with the thinking it could at least, get up to bigger levels. So it was a definite balance between speed and cost was wheezing in depth. We also saw a lot of open source LLMs nitron Ultra 253 B, for instance, A pruned LAMA 405 B that bet the actual lama for, QI visual language, three bill million model, MIT licensed, deep coder, high Dream, GLM four family. Lot of models. And, especially, noteworthy was also that Google celebrates MCP, celebrated MCP officially supporting it and joining Microsoft Na SAWS in, in its, MCP support. at the same time they had the agent to agent protocol, if you remember that. I don't think it got much traction. I'm not sure. Oh, yeah, we had a chat
Alex Volkov
Alex Volkov 35:54
about agent to agent here on the show with, With
35:56
the guy leading this in Google. Yeah. Yeah.
Wolfram Ravenwolf
Wolfram Ravenwolf 35:59
So I wonder where it's going right now, if it's still relevant.
36:02
But MCP definitely ruled for the integrations into the agents. CO three API was opened up for API access, so that was also interesting. And Ji bt memory upgrade so you could, enable the memory across all of your chats. That was also big feature. So may I
Alex Volkov
Alex Volkov 36:20
Oh, memory was huge.
36:21
Yeah.
Wolfram Ravenwolf
Wolfram Ravenwolf 36:23
Yeah.
36:23
Although I personally think, it's very important and a big feature that we can control the memories. Giving it access to everything can also be very distracting, but having the option is great. And, yeah, that was a good step ahead. we also got video two Google's video model being generally available. That was also big for a lot of people now using it to create, yeah, videos. we had the Cling 2.0 Creative Suite one way, gen four or lots of stuff happening in the vision and the video world, and also in the voice and audio, the diar 1.6 BTTS Pipe, cat, maybe Kwindla wants to add something as well in that, segment, dolphin Gemma. And, notice noteworthy is also that Open. I brought out its open source, codex, CLI, basically a competitor to, Claude Code. And considering that it is from OpenAI and that it is actually open source, I would have expected it to be much more popular, but for some reason it never took off that much. Maybe the model is more important, or at least as important as the harness itself. yeah, we had Firebase studio from Google. Not to be confused with the AI studio that they have, but that was their vibe coding platform where you can, build complete apps, database integrations and hosting. And Git MCP was very popular. Popular as a MCP to connect your GitHub repository to the agent so it can access it and do stuff with it. Yeah. So yeah, big month. Lots of big month. I wanna
Alex Volkov
Alex Volkov 37:51
pull out, a few of the launches that
37:52
happened actually on the show. Qwen. You guys launched, smart turn, and worked on smart turn back then. any else, any additions to this very intensive April? yeah, real
Kwindla H Kramer
Kwindla H Kramer 38:03
quick on audio, 'cause it just felt like
38:04
a big month for voice agents. the Gemini Flash text mode update was really compelling, and that was a moment when a lot of people in the voice agent space switched from GPT four oh to Gemini. Two, five flash for like production voice agents. there was also a live model version of that update, which, they released the first one in December competing with GPT real time. They did a big update in April. that DIA model that a couple of Korean students did, got people really excited about open source voice. I think it made people realize you could build open source voice models on, one GPU at home with data that you curated. and so there was just a ton of stuff that felt like it unlocked voice in people's minds. In April,
Alex Volkov
Alex Volkov 38:50
Yep.
38:51
So this is April. we will continue to, Ryan, you wanna do June? I can do May. I wanted to do May for Go for it. You do. May. I'll do June. All right, folks. I think, the reason why I wanna take this one is because, of this insane release of VEO3, may, if May of 2025, was the highlight of the year in the uncanny valley being basically dead, as you can see here on the infographic, and, you can probably see everywhere else. Google had, released via three and then later some updates, there as well. I think, it was Google io, back in May as well. Google IO is the yearly kind of like announcement, for Google. We covered it live, by the way. I had interviews, live from there. and, VEO3 was absolutely earth shattering in the quality of video generation that we can do. VEO3 came out, I think one of the first ones with native audio. One of the first good ones were native audio still. It's one of the top video models today, to date and native audio with perfect lip sync. And it looked like a world simulator. Stuff like prompt theory started going viral everywhere else, where people were creating these AI videos of folks interviewing each other on the streets and say, Hey, are we prompts, are all we prompts? And the absolute realism with the. Lip sync and voice, was the thing that shattered the ai, bubble again. And many people started like freaking out. and I basically said back then we've crossed the uncanny valley. So V three was definitely a huge release of this, of this may but also on the LM side, Claude for Opus was released and sonnet as well. they actually live drop during ThursdAI, if you guys remember, it was breaking news, and 80% of Swyx bench verified. It was historic Moment cloud for opera and sonnet. they claim to handle six and seven hour tasks, which is very impressive as well. definitely Sonet four and, SONET Opus four specifically, mentioned like Ma made a dent. they're the first models to cross 80% on Swish verified, they did hybrid reasoning and instant response mode, both, I think those were the first, CLOs with reasoning, that we can actually get access to. And, this is also a theme of this year, kinda reasoning started propagating across all major models in all of them. just, before this in April, we covered that Gemini Flash were like added with thinking and thinking budgets. And now in May, Opus and Sound also were like hybrid models with reasoning as well. and the knowledge cutoff was very fresh for them as well. as I mentioned, oh, and also, Yeah. Qwen three, released starting of May. So in open source, Alibaba dropped the most comprehensive open source release ever. It was like a ton of models, eight models. it was like two big MoEs with 2200 35 billion parameter active, and 30 30 billion parameter active with three B only. And then six dense ones started ranging from 0.6 to 32 BI believe that we had Ian link from, Alibaba Qwen on the show, Apache two license on everything this was a, an incredible thing for the open source community. Just incredible models overall. They got, used and I think still used the que three. The small models are used everywhere for fine tuning. they had this very interesting way to like toggle thinking, on the fly. They had a runtime toggle for chain of thought on demand. So before this, you had to choose if you're gonna talk with a reasoning model, not reasoning model, set it up. they had this, a feature where you can mention in chat, Hey, I want you to like, think with a slash thing. And the model will Swyx switch to a reasoning model, behavior. I think that was like super, super cool. the 4 billion parameter dense Qwen beat. so 4 billion parameter dense Qwen, three beat Qwen, 2.5 72 billion. Parameters from before on multiple benchmarks. So again, I'll say this like very slow, the 4 billion parameter model of this new one beat the 72 billion parameter model from the generation from before. It was I folks, it was an insane release. Qwen three was just an absolutely insane release. Maybe like one of the like top open source releases this year. 30, 36 trillion training tokens, 120 languages. and then they want support across everything. Nisten back then said the 30 BMOE is sonet 2.5 at home, and it runs over a hundred tokens per second on MacBooks. It was just mind fucking melding. and then, yeah, GPT four, oh, I don't have this here on my infographic, but GPT four oh Native image generation. it was Ghibli money at 2.0 because they open sourced, they opened it in an API, if you guys remember, when Gib Money happened? Oh, not this one, sorry. when Gib mania happened, it happened like in Chat GPT. So you had to go to JG PT and General those images. Then they released GPT image one in the API, which made like a, again, a bunch of them people were playing with this. they, they support generation and edits and asking and excellent text rendering excellent in quotes, right? Because we later got Banana Point, like we realized what excellent text actually means. Google IO had a avalanche of releases, like a ton of releases. But Gemini 2.5, 2.5 pro deep think, was released back then. Gemini 2.5, flash went to ga. We're thinking budgets. We mentioned them a little bit before, Gemini diffusion. was released. It was a cool thing. We haven't seen a lot of diffusion since then. But basically, if you guys remember, we talked about the, ya think you helped me cover this, diffusion language models where basically they fill in the tokens kind of like all over the place and not sequentially was a thing. And Gini Diffusion talked about 2000 tokens per second for code in math editing. And we haven't seen tons of stuff. back then. There is JUULs. There're ay coding agents@juuls.google. it was very interesting because they keep releasing AI agents, and editors in Google. they showed off, finally Project Mariner, which is browser control via API. Basically it started with the Chrome extension, but they didn't release Project Mariner. It became like a internal part of thing, so you couldn't use it. and then they also announced the Gemini alter tier and AI mode in search. going to open source. There's a bunch of open source stuff, specifically dev trial, mis trials, developer version. I think this is the first dev trial of 24 B. It was state of the art of Swyx bench verified, Swyx inch verified, continued to beat the benchmark that everybody just beat like Swyx bench verified from the fifties to sixties to seventies to eighties. this whole year was been just like incredible to see. and then Evid Nitron kept releasing stuff. jam three was released, four B Notfor. and I think that's mostly it, in big companies. So we covered the Codex agent. we open the, I oh yeah, open the, I hired Johnny. I, if you guys remember, this was also back then, $6.5 billion deal. io his company and now is doing stuff was publicly announced. They doing stuff with OpenAI, GitHub copilot, open sourced, fully, Microsoft a MCP and Windows and Ella Marina. The folks that we used to look at for every vibe comparison raised a hundred million dollars. in vision and video. I think the highlight, the super cool thing was Odyssey. If you guys remember Odyssey, Odyssey was like super, super cool. It was in, walkthrough AI generated worlds that you can use the WASD keys to control, Alibaba released open source diffusion transformer video one, 2.1. So this was like one X before. and in voice and audio, I think when you can maybe help me see if there's anything interesting here. 11 Labs V three was released, and GPT 4.0 transcribe, with Semantic Force,
Kwindla H Kramer
Kwindla H Kramer 45:53
I think.
45:53
Yeah, it felt like a much more incremental month than the month before.
Alex Volkov
Alex Volkov 45:56
Yeah, absolutely.
45:57
and there was also the unmute data Sage from QTI, that adds a voice to any LLM. I think in tools Claude Code went ga in, in April. sorry in May. So I think that Claude Code was like a better thing and Claude Code went finally ga in, in, in May of this year. I think that this is it for May. Let me see if I'm, yeah. Alpha Evolve Cloud called GA and Cursor V one. Yeah, cursor was also released and introduced MCP support and bug both reviews PRS in cursor, in V one.
Nisten Tahiraj
Nisten Tahiraj 46:28
I still use Opus four today.
46:31
So that's, I think that's gonna be one of my all time favorite models.
Alex Volkov
Alex Volkov 46:36
Yeah.
46:38
Opus four, definitely a big one. we're moving to, let's not, let's not wait too much. Let's move to, let's move to June. Ryan Carson with us. June is the new normal. This is what my, I decided to call this. It's
Ryan Carson
Ryan Carson 46:50
clever.
46:51
It's
Alex Volkov
Alex Volkov 46:51
great.
Ryan Carson
Ryan Carson 46:52
All right, everybody.
46:54
A lot happened in June. I'm gonna go over the high level quickly and then I'm gonna zoom in primarily on two stories. So we saw oh three, drop prices massively, right? So this is a huge deal. we saw a big drop in the token prices, which means that mostly startups, could get access to these hyper intelligent models a lot cheaper. to dig into that a little bit more, but let's go to meta and scale. this is where we started to see Zuck get out his checkbook. and he just started buying everybody. And, my opinion about this with Alex Wang, coming over to Facebook where it was basically an acquihire, but a huge one, was, in the end, it's hard to say no to absolute bonkers money, right? probably Alex as well saw that this was a chance to really, impact one of the, potentially largest labs in the world. so that was kinda interesting to see, meta just shower money on various teams. that was interesting. we saw mid max M1 beat R one. just again, see. this team just delivering amazing value, coming out of the east, which is, disrupting everybody in the west. and then we saw another CLI, hit the market, Gemini, released the CLI and open sourced it, which was really cool. saw a bunch of open source news as well. Sal three, two, their image gen, we saw, magistral come out as well. Deep CR 1 0 5 2 8. and elect two. So a bunch of open source news. I wanna double click real quick though on the two top stories. 'cause I think they're the most important ones. oh three pro. and just the general drop in and in price for intelligence. Here was the beginning, I think of the unlock for a lot of startups. like I run a startup, I'm not, you don't have billions of dollars of funding. So the more intelligence that all of us startups can get, for a cheaper price really does unlock new businesses. so it's fun to see this price war really heating up, between the big model labs. and this continues, right? so I was excited to see this. So that was, June is pretty exciting as always. I hope you enjoyed it.
Alex Volkov
Alex Volkov 48:56
Awesome.
48:57
folks, this is, okay. So we are gonna zoom out on this whole quarter, Q2 of 2025 with a bunch of so I counted three. Agentic, coding models. I think, let's take a look altogether of some of the stuff. I think, Ryan, thank you so much for covering, the June thing, the metal scale thing. As far as I remember, I just I'm inviting a discussion about this whole quarter and the stuff that was raised in, like the main themes here. meta scale thing was a big one. The numbers were just absolutely insane. They were thrown out, apparently there was a news afterward that Zuck made some soup and hand delivered some soup to prospective candidates. I didn't get no soup from Zuck. and, but apparently like he was bringing out soup to some folks, which very funny. as, as just like a joke. But yeah, that we saw, I think publicly some of the biggest paychecks that like were on record for some of these folks. And just like the world realized, like some of these, like super, like hot researchers are an absolutely in high demand, back then. And this was a responsible for a very lackluster, if said disastrous LAMA four release. and since then, nothing absolute silence. Same amazing model. Same three, same 3D, same audio, world stuff from meta, but those are not the meta super intelligence labs. No super intelligence. The one thing that I will say since then, since this move, Zach turned around and said that the course on open source was turned away. they did release a bunch of meta AI stuff and you can go and there's like a bunch of stuff we can talk about them later. but nothing like very significant from this very strong investment. and hopefully there's a Google-like narrative here that it's taken them time to start, but then they're gonna start releasing everywhere. But I haven't seen it yet.
Ryan Carson
Ryan Carson 50:36
Yeah, we all wanna see meta be a real player here.
50:38
they're gonna, we need to see more competition at the Yeah. The high level and at the open source level. So I think we're all rooting for them to come back.
Alex Volkov
Alex Volkov 50:45
What do you think about, this, this quarter in, overall Qwen?
50:48
You wanna comment on the, on, on, on your like native, angle?
Kwindla H Kramer
Kwindla H Kramer 50:52
It felt like it was the quarter where people got
50:54
really excited about voice agents outside the bubble of people who are already excited about voice agents. People were talking to Chat GPT, advance voice all the time. Google dropped these amazing releases. Amazon was like, oh, on the enterprise side, we've gotta have Nova models support it. and the disappointment was the LAMA launches. 'cause we really felt like, okay, LAMA four is gonna be this great open weights option for these, for what we do for multi turn conversation. And it just, it had great benchmarks, but you just couldn't use it in practice.
Alex Volkov
Alex Volkov 51:21
Yep.
Nisten Tahiraj
Nisten Tahiraj 51:23
For me, this was the start of the 24 7 AI agents
51:28
era because of the cloud max plan before they nerfed multiple times. There were a whole bunch of people that were just running those things 24 7 and that was great. So that's all I wanna say.
Alex Volkov
Alex Volkov 51:44
This was a brief period, right?
51:45
Like day afterwards? Yeah, they took it down.
Nisten Tahiraj
Nisten Tahiraj 51:47
Yeah.
51:47
That, that, spoiled a lot of people because of what you could do if you always had something running in the background.
Wolfram Ravenwolf
Wolfram Ravenwolf 51:54
Like I said, when I covered April that meta
51:56
dropped the ball and Qwen. they picked it up and went with it. So yeah, these models and what Qwen did, they became the new, base models for a lot of open source projects and they're still being used today. So this was a release, back then, which is, ages ago in AI time. even if it's just half a year ago or something. so in the end, this was one of the most important, AI releases of all time, I would say. Yeah, so far, definitely a ma major milestone in a big moment. Yeah. And yeah, has been said, I would like to see meta pick up a ball again and go back to play with us. Yeah. And do something in the open source space, especially
Alex Volkov
Alex Volkov 52:35
LDJ comments on this, and then we'll move on to the next queue.
LDJ
LDJ 52:39
Yeah.
52:40
So I believe it was around June where thinking machine labs, the news broke that they're having their, the first big funding round and, the existence of them and them trying to come onto the scene as this new lab. It's a really interesting, exciting, and a lot of questions and mystery around the exodus of what will these people create. And now we have some answers. There's a, the tinker platform as well as some open models that they said they're going to release in 2026.
Alex Volkov
Alex Volkov 53:09
The Thinking Machines.
53:11
AK Thinky is Lab, formed by Miro Murti and just like an absolute avalanche of top tier researchers that, when Z started, like poaching people from try to open the eye and like different places and scale, obviously. it didn't seem like he was able to get anything in machines, people. If anything, I saw the opposite. I saw, like MSL folks who didn't get the top kind of scores and top paychecks go to thinking machines and find their home there as well. bank of Quarter specifically the main theme, I'm gonna run through 'cause I have them noted that we talked about them. so AI's economic impact accelerated. Obviously OpenAI raised that $300 billion valuation since then. Like it rose again. o all three price drops, 80% in four months. windsurf, $3 billion acquisition back was back then as well, to Google, if you guys remember. And then there was the whole thing where, windsurf went to Google, but only the top folks in Windsor, the Google, and then the rest, were like, left a drift and then cognition, picked up them and Windsurf as well. It was like a like cool thing to see that Windsurf is still around. Even though the Google bought out, this was like the team anti Gravity, would became team on Gravity. And then, of course there's 9 billion valuation and I think it still kept rising since then. jewels Codex, GitHub, copilot agents, all of these like agents were released just like one after another this quarter. just absolutely like bangers after bangers and Gini, CLI, in open source as well. And as MCP became the universal standard for all of it. Google, I was a big thing that people tried to undercut from before and after, and they didn't really, weren't really successful. and then Chinese open source, just absolutely just like everywhere. and, Qury and Dipsy are one updated on May oh 5, 28. and then Minimax and one as well. We're gonna go to Q3 and have LDJ queued up to take us to q3. O this is a dense one. I'm very happy that we chose this
LDJ
LDJ 54:55
July.
54:56
Okay. Yes. Yeah. So Kimi K two released. So now they started getting some more popularity. some of us recognized their prior is back when they ended up releasing something very similar to R one, when that was Kimi, K 1.5. But now they had their own week where they had some popularity. They had really high scores in Swyx bench. It's 32 billion active parameters, about a trillion total. So roughly similar size and active parameters to deep seek, but larger into total, maybe some world knowledge ends up being able to be, there was maybe their thinking process in that. And I believe this is maybe one of the first actual good open source models that was at that 1 trillion parameter mark and beyond 128 K context modified MIT license. So it's actually open state of the on EQ bench, which a lot of people like, 'cause it could actually be creative. It was good at writing things like that, moon optimizer and different types of more modern tweaks like that, that people were excited about. Then you have grok four and GR heavy. I think this was maybe one of the first models where people actually started taking XAI very seriously. And it was arguably actually as a contender, as the number one for a brief period of time, and definitely at least trading blows with the other players. And same scores on R-K-G-I-V two at the time, a hundred percent on a ME 25. 2025. That's a qualifying exam for the international methyl and pi. Then chat,GPT agent. So that combines a bunch of things like the browser, their, terminal research, the deep, deep research features and insane scores on humanities last exam. Then we have Chinese open source, we have Baidu, the, there's so much stuff. It just insane. You're, sorry, I'm just thinking high level. yeah, so Baidu, Ernie 4.5, that was multimodal. A bunch of different sizes. it says 10 models there. I forget really what all the different configurations were, but yeah, different mixture of experts models, I believe from
Alex Volkov
Alex Volkov 57:08
400 to 41 B to 0.3 B. Yeah, just like an insane range.
57:13
Yeah. Yep.
LDJ
LDJ 57:15
And then, Tencent HO one, Tencent, one of, the other mega corporations
57:20
in China similar to Alibaba. Then we have Huawei Pengu Pro, MOE, which apparently didn't use any NVIDIA chips, in its development.
Nisten Tahiraj
Nisten Tahiraj 57:33
The coder?
Alex Volkov
Alex Volkov 57:35
Yes.
57:36
Qwen three coder, the
LDJ
LDJ 57:38
big
Nisten Tahiraj
Nisten Tahiraj 57:38
one.
LDJ
LDJ 57:39
That was QU three coder.
57:40
Sorry, I'm opening up another PARP notes here. Okay. So for Qwen three coder that had insane scores on Swyx bench verified at the time nearly 70%, as well as using less training tokens than a lot of other models at the time, a lot of them were using like 15 trillion plus. This only used 7.5 trillion tokens. Then we have a Qwen 3, 230 5 billion total and 22 billion active parameters. And that's an instruct model, not reasoning. But also just very good for it size. A lot of people are saying it's competing with Deep Seq, R one, then deep suite preview, that's just pure RL on Qwen. Three, some further improvements once we've been verified for its size.
Nisten Tahiraj
Nisten Tahiraj 58:28
Oh, we have missed one from the Chinese ones.
58:31
GLM 4.5 also came out in July. And that was like the first one that could, yeah, 4.5 was the first, like really good for web editor, but also agentic work.
Alex Volkov
Alex Volkov 58:46
Yeah.
Nisten Tahiraj
Nisten Tahiraj 58:46
yeah, on hackathons.
58:47
People actually that went to competitions and stuff using like Coin and GLM just running on Cerebra. Oh, that was also big because you got, that was a 500 tokens for coding mock. yeah, I think that was a big change.
LDJ
LDJ 58:58
Yeah.
58:58
Yeah. Yeah. So that was impressive. what Nisten mentioned GLM has been doing a lot of cool things this year, but, at least what we have on the notes here, so small LM three, that was, I think this is the beginning of the American models starting to come back and those labs more recently we have, prime intellect and Allen AI also doing more impressive things. But yeah, the small LM models are cool and I believe those are fully open as well. The training code, the, pre-training data sets and everything, and. You have the, the reasoning models, variance of that too. right below that. And then out of nowhere, lg, like you have in parentheses, the fridge company, they, which they also make TVs and then things like that.
Alex Volkov
Alex Volkov 59:43
Oh, we made fun of the fridge.
59:45
Fridge, yeah. Company making AI for so long. This was like a running joke. Yeah.
LDJ
LDJ 59:50
Yeah.
59:50
But they have a set of researchers developing, doing AI research and developing things, open source. And so they came out with something impressive, a really high score on MMLU Pro for its size, 32 billion parameter model there. Then when it comes to big companies and LLMs and APIs, we have what I was mentioning earlier, Chat GPT agent GR four heavy, a private model that I think still hasn't been, released publicly in API even, but the OpenAI and Google IMO gold models. And here we have White House AI Action Plan 90 policy proposals for US AI dominance. And that's its whole other thing that would probably take a while to go through. Yeah. But, yeah, but basically just trying to help, infrastructure advancements, infrastructure scaling, energy scaling, and so on and so forth. For vision and video, this was a big year and big quarter for multimodality Wan 2.2. It's an open source model. It could run on a single 40 90. It can create pretty decent five second videos, seven 20 p, and one of the first mixture of experts based video models. Which also ends up with it usually being able to run more efficiently than the typical models that would fit on a 40, 90. Then close source video models, runaway gen three, LF and runaway Act two. Both of these really insane, these are some of the first that I saw could actually do things like putting, like changing the clothes of a person in a video, changing the background and just inserting moving objects and realistic physics into scenes that actually looked pretty decent and I wouldn't be able to tell it was ai if you didn't tell me or if I was looking at it quickly. Yeah. Then a voice and audio. I think this is the first time trol ended up getting involved in audio at all. They ended up releasing, state of art speech recognition BWI per V three. Higgs audio V two also ended up beating GPT four oh mini 11 labs on some metrics like prosody or Fusion and producer ai. They're lesser known, but they ended up releasing ChatAble studio producers.
Alex Volkov
Alex Volkov 1:02:19
I don't know if you guys remember this, but this was
1:02:21
like one of the crazier things that I had played around with the show producer AI specifically. You could like just upload. before this we had Suno Ryan, thank you so much for joining us. Ryan Carson, I just had to drop. Thank you for joining us as well. Alright. do we have left tools in here and then we're gonna continue to August? Yeah. Yeah.
LDJ
LDJ 1:02:36
Yeah.
1:02:37
So for tools, this was really the year when I think the first serious AI based browsers started releasing. I think the first example of that this year was perplexity come, and then later on in the year, of course you had the OpenAI browser, then Amazon Kiro, spec driven ai, IDE from AWS I'm not sure if that really ended up gaining much traction.
Alex Volkov
Alex Volkov 1:02:58
No, I haven't talked about it since we've seen it yet.
LDJ
LDJ 1:03:00
Yeah.
Alex Volkov
Alex Volkov 1:03:01
but there's also a point of the recap, right?
1:03:02
we don't know when things release, what the impact is gonna be. Like if Opus four, we have a feeling, Claude Code, none of us could like immediately say back in February Hey, this is gonna change the industry. Like we, we actually don't know. So it's really interesting to like recap and see some of the stuff fizzled. many of the staff got upgraded to a newer model,
Wolfram Ravenwolf
Wolfram Ravenwolf 1:03:20
Okay.
1:03:21
Let's, cover August, 2025. And, to put it in perspective, that has been three years after I got into AI with a stable effusion moment. Just like you had. Yeah. Think you, that's the same starting point for this. And we went all in ai. So in, within this short span of time, we are now atGPT five already, and it's, actually been 32 months and theGPT four release, and which also is 32 month of Thursday ai, which is, if you did an episode every week, that would be 128, I think. a lot of, AI and news coverage here. And in that month,GPT five launched, so it was, exactly 32 months afterGPT four, it had a 400 K context window and at $1.25, per million input tokens and dollar 10. Per million output tokens, which makes it significant, the cheaper gen than Opus. Which was dollar 15 or dollar 75. the special thing aboutGPT five was that it was a unified model, which, with a router. So it decided by itself if it should be thinking or non-thinking instead of the users electing a specific model for that. and I guess one of the big benefits is for users who don't know what all these models in the models select are, they just use the default model and it'll decide what it does. But as it is always the case, this releases, or most often, there was a bit, there were other bugs when it released and it didn't even select the right models on time. So it wasn't thinking when it should be and the quality was subpar in that way, that situation. but it was fixed data. it had quiz more Gmail integration memory feature, so a lot of stuff around the model that improved it. but the writing quality decreased, which I noticed with every release from then on that, they probably used more, yeah, generated data and less probably copyright, the data that they had before. that was noticeable anyway. another release by OpenAI and one that has been, waited for so long was that they actually released an open source LLM once more. after GPT two was the last one before that, as far as I remember. Yeah. it was GPT OSS 120 B and 20 B. So two different models, two sizes with configurable reasoning via system prompt. So you could set the reasoning level. it did function according web search, python execution, all you would expect. it gave you the full chain of thought, which you did not get with the online models. And, yeah, it was great at coding and math, but very weak at creative writing as well. So if you need a reasoner, a genic model, that would be a great thing. It was very smart and very fast, but if you want to have something for creative writing, then it would not work very well. So you would have to look somewhere else. Google also did something, their Genie three work model, which we have shown on the podcast. This is something that you really have to see how impressive it is, where you could move around in a AI generated room and you had even, you had a pr and you could, paint the wall and then turn away and turn back. And it was still in the same spot, which, we had, before that it was always changing. Like when you look away and you look back, it was different. So it had persistency. That was a very interesting.
Alex Volkov
Alex Volkov 1:06:39
Good.
1:06:39
We still haven't got a chance to play with gen three. Yeah. I'm so salty. six months after, we don't wanna play with this model to see if what they gave us was actually real. Some like cherry picked like few examples and apparently like costs so much that it's not for regular humans. but I really wanna play with it. G three was like. One the craziest releases of this year, I think.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:07:00
Yeah, it definitely was very impressive
1:07:02
and yeah, we didn't get it. We only got it, yeah, to look at it. what we did get was deeps seek V 3.1, which was a res, a hybrid reasoner. So when release, which we come up before they were, thinking about if they should, not make a hybrid model anymore. The coin release was hybrid. Then they discussed it that they should not do hybrid models. but deep seek showed that you can do a hybrid model, which is reasoning or non reasoning, depending on how you do it in the, in the prompt. So yeah, I think that was a big moment to show that it does work very well. And, yeah, it matched R one with fewer thinking tokens, so you'll see the progress with the intelligence. The models get more intelligent and more efficient, and two calls inside the thinking process. It did very well on SVE bench verified with 66% versus R ones 44% and 120 K context MIT license. So very good local model. yeah, there are, there were other releases like seed ROSS 36 B from ance, which was a patchy two license of, with 520, 12 K. It was very, big context for local models, thinking budget control. We had some Nemotron, nano nine BV two, which was a mixed MABA and transformer architecture. cohere command, a reasoning came out. And the cohere models, I hold them still very dear to my heart because Command R was one of my favorite models before that. And comment A came out at that moment, which was also a focus on agent, like we said, the year of the agent. You can see it when you look at all the model releases, thinking reasoning, tool calling. yeah, it was all the necessary prerequisites to make agents work even better. GM 4.5 V as well. yeah, but did we have as well, we had Nuan Game Craft, which did game video generation with civics where you could run around in the world and change it. matrix Game 2.0 by Skywork real time work model. So work models also started getting o taking off. And what did we have? And, AI added diffusion. Nano banana. That was a nano banana. yeah, rumored mo model doing 3D Aware seen editing that is, quite image. Edit was also there, so it started that we could edit images even better. But, open may edit with the image model. Now it was possible to run it locally. UX one, we are deaf as well. And, agents, md the standard started getting, yeah. There was an effort to make a standard since we had CLO MD for CLO code and yeah, course MD there
Alex Volkov
Alex Volkov 1:09:48
was like all of them had their specific ones.
1:09:50
Yeah,
Wolfram Ravenwolf
Wolfram Ravenwolf 1:09:50
So to have a one standard, so you could use, same format
1:09:53
for every agent, whichever you are using. yeah, maybe you want to say something about Kaip White Biases container is multi-agent coding workspace. yeah. Do you want to say something about that? yeah.
Alex Volkov
Alex Volkov 1:10:04
Kaip was dope.
1:10:04
it was, it was a way to run a bunch of Claude Codes and it's still being delivered and worked on, and it's super cool. should definitely folks give it a try. Look. Very cool too. Yeah. And, I heard some news over the holiday, party at, so there's gonna be some updates to Catnip that are very interesting that we're gonna cover very soon as well. thank you for covering the, the, I don't know how to say this because like he only accelerated from here, but fucking, uh,GPT five launch finally after a while OSS, for open source, like from Open the Eye, we've been waiting for this for a long time. It's been rumored. we had folks on the show who participated in Open the Eyes kind of conversation with, with open source developers and saying, Hey, what should we do? Should we give you in the reasoning model? Should we give you this? Should we give you that? So they landed on GPT oss hundred 20 and 20 B. I would just remind folks that, July was also, sorry, August was, also a little bit after we launched the infra service for WNB and we started participating in open routers. So you can like, we host all these great models on the core with infrastructure for you. It was just like, ACRA a crazy month for sure. and we just keep accelerating from there, going to. September, we'll do, September. Ya, you wanna take September?
Yam Peleg
Yam Peleg 1:11:14
Okay, basically we got, GPT five Codex.
1:11:22
Okay. We got GPT five Codex, which is, open AI's, spec specific, Finetune model for, code, basically transforming, at least for me, the whole feeling of code Codex, codex CLI tool, making it much more powerful. And at the time we didn't get, the 4.5 opus yet. it was definitely, at least in my opinion, definitely the best option to use. And many people, thought that as well. We saw it all, all over social media, all over x people were talking about how Codex is really good solution at this point. we also got a new, meta AI glasses that are, that are to be in the future. Used with ai, but, are at the moment our incredible option for a PO as a POV camera. And, in general also as a headset, you can use them for, you can use them for conversations and just like any other headset you have. And, Nvidia and OpenAI deal, if I recall correctly. the way it worked was, NVIDIA is, is investing a hundred billion dollars into OpenAI, with the premise of OpenAI using this a hundred billion dollar back, on Nvidia services. So basically you transform the money, in a loop and, infinite
Alex Volkov
Alex Volkov 1:12:47
glitch.
1:12:47
Infinite gl. Yeah. Infinite
Yam Peleg
Yam Peleg 1:12:49
money glitch.
1:12:49
the thing is that, the thing is why it's infinite money glitch, because, the stock price moved quite a lot because of this, move. it basically generated, quite a lot of cash for, for the party. This, by the way, started a whole wave of OpenAI, OpenAI deals, financial deals. we're not gonna cover all of them, at the moment because there, there were just so many, following up this deal, we also got Suno V five, state of the art, some generation model from suno. V four is already amazing. V five is even more amazing. Yeah.
Alex Volkov
Alex Volkov 1:13:26
So I'll cover super quick the vision and video stuff.
1:13:29
Also during this, during the September was look also crazy. we had, we had a bunch of releases by, by then Sea Dream four was like state of the art, generation. specifically it was like, 4K resolution videos. it was like quite incredible. We had Lucy from, Dekar AI five second generation in 6.5 seconds, which is quite insane. So like almost real time video generation. and, one from Alibaba came with a bunch of releases, one, 4.5 and 2.2 animate, cling 2.5 Turbo and Ray three from Luma. it was an insane month from video models everybody was trying to beat and to get at VEO3. none of them quite got there, but because of some of this is an open source, open source folks definitely prefer, the open source video models. but lip sync and motion transfer were added to the models as well. for me specifically, SUNO V five was the place where I just decided I'm no longer gonna be able to tell where, which music we generated with AI and which music we generated by humans. This is it. This is we've passed the Rubicon on this and, as far as I'm concerned, all music going up was generated by AI because I just, no, no ability to tell at all. ya you wanna, finish up here or you want me to take Yeah, sure.
Yam Peleg
Yam Peleg 1:14:36
We got, Deep Seeq, V 3.1 Terminus,
Alex Volkov
Alex Volkov 1:14:40
this was
Nisten Tahiraj
Nisten Tahiraj 1:14:41
where it just got too much and we were barely keeping up every week.
1:14:44
yeah. Oh yeah. I think I missed one week and I was just off, was to show again.
Alex Volkov
Alex Volkov 1:14:52
All right, so lemme just land on here.
1:14:54
rev A is a tool that I still use four in one image creation editing platform. reve.ai shoutout to them. fefe, Dr. Fefe Lee came up with, world Labs Marble. You turn images into walkable Gian Splats, in a 3D world. That was like super cool as well. And then, Google finally added Gemini to Chrome, although it wasn't agent, it was just like, Hey, talk to Gemini. Understood only the complex called the one tab. and then the Chat GPT added full MCP support back in September, which was really great. I think that this is the highlights of September, and then we can cover the quarter summary in like the major themes. So GPT five era, open source hits trillion parameter scale. So with Qmi K two, coin Coder, half a trillion parameter, dip seek. And, we launched, WB inference to host these monsters because they're not possible to run locally a hundred percent. open Worlds became like a big thing during ship time. During the q which queue are we, Q3, getting lost here, during Q3, Google Genie, three Huon game graft metrics, game two. And OSS all kind of came up and showed that world models is where it's at. I believe, we're not in the prediction of the next year, and I don't think we're gonna get there yet because it's been a long show. But I, I think that those models like are very impressive. and then video, video and open source moved into Uncanny Valley, although everybody tried to keep up with, with vo. The investments in September were just absolutely insane. Open the eyes. Oracle 300 billion deal onAnthropic, raised at a almost $200 billion valuation. And then, meta MSL continued to poach researchers with a hundred to 300 million payout packages a year. This was like, we weren't sure if we're talking about the same, number of zeros. I remember, like we, we were discussing how much, how many zeros is a hundred million dollars. and then, Chat GPT agent, not quite in the browser, but unified. So you could ask the agent from Chat GPT and not no longer go to, operator. and then MCP support everywhere. So biggest releases we talked about. That's great. And I think it's time to go to October. All right, cool. Nisten, let's October. This is Q4. We started. Yeah, sure. let's go. And Q4 was the most absolutely. Q craziest, of all of them I believe. So let's go
Nisten Tahiraj
Nisten Tahiraj 1:17:07
all.
1:17:07
All right. So Q4, we got Sora two. So that was, that, that whole thing got democratized. And, the memes are still continuing. you finally got to see a cat being, just like shooting a bazooka shoot and with a wrench on, and a raccoon on his forehead. So this just became, Mainstream Now.
Alex Volkov
Alex Volkov 1:17:28
Can we hold on one second.
1:17:30
Sorry to interrupt. Can we acknowledge for a second how great of a Sam Altman, Nana Banana just added to the infographic? this is like literally, whatever was happening behind him. Don't look, but like Sam Altman specifically, this is like a great Sam Altman and I didn't ask for him. Like I didn't provide the reference image you No, but now just added some Altman because the big thing about Soro was that, Sam Altman's cameo made it explode. You could use some Altman in cameos. Go ahead Nisten. Sorry to interrupt. I think it was worth calling out.
Nisten Tahiraj
Nisten Tahiraj 1:17:58
It is.
1:17:59
This model is, it is not. I still wonder what the, what they do, but, yeah, let's not digress 'cause because we have a lot, got a whole bunch of stuff from, from OpenAI. we got agent kit, they launch Chat GPT apps. I don't know if people use those, but they have 800 million users. So probably quite a lot due. yeah, Sorg got API that was, a much, cheaper speech to speech one. So there was realtime, realtime mini and yeah, just the numbers just kept exploding. Another 1.4 trillion in compute, deals might be scaling up now a zero, scaling down a zero two or up. But, things really heated up there. so we got a lot on the medical side. yeah, so there was the 27 B gamma model that, they used like in, in conjunction with, another different architecture to find, To simulate cancer cells and find different things about them. But they use a very large, this was the first time that they used like a very large transformer model, but also the first time that they validated this on an actual, like in vitro, on a lab. again, the AI scientist stuff that has been going, but now that agents got very good, this stuff, yeah, we're gonna see a lot more of this too. And, oh, and I think this is probably the start of, pro a, proactive AI as a concept. I know not a lot of people are using it, not as much in the open source, but, these constant like reminders and timers and it just prompting you. So I think we could say this is the start of the era where, AI starts prompting you instead of you prompting it with notifications. so yeah. Okay. Now let's cover a whole bunch of the, the open source releases because, we did have, yeah, we had quite a few. Quite a few actually. There was also, ado, Adobe Max. they integrated the Firefly, video too. But, yeah, open source L Labs. We got deep CV 3.2 xp. This one scored very well. And I remember testing this one. It also worked with Claude Code as a, as a replacement. we got another fridge model. We got CRM, is that 7 billion parameters?
Alex Volkov
Alex Volkov 1:20:20
7,000,007.
Nisten Tahiraj
Nisten Tahiraj 1:20:23
Oh.
1:20:23
This was the really tiny research one. We should have looked at this one a bit. More. I, yeah, I think we just mentioned very quick. It's probably worth another second look. later on we got Ling, 1 trillion. I don't think anyone used this, although all the benchmarks look great.
Alex Volkov
Alex Volkov 1:20:41
Yeah.
Nisten Tahiraj
Nisten Tahiraj 1:20:42
but we did get one that a lot of people use now.
1:20:46
They still do. A lot of businesses still do. We got GLM 4.6, and, this was, this was like the real sonnet, at home. it was expensive. people still suspect again, it was Apache license, I think. So you will still suspect that, this is what the, cursor, what was it, composer is based on. this also got added to, we, we do know now for sure that open code also, serves this model by default. So if you install, open code, this is what the, they call the big pickle. the actual model listed there, it's is GLM 4.6, but they host it themselves. this is a pretty, this is a pretty big one for, for agent ai, for vibe coding, for open source stuff. because at the time that this came out, there were a whole bunch of issues with, with Claude. I think it was like right before Sonet, 4.5 came out. Yep. So for about a month or so, this was competitive to the point where people, even if you had a cloud max plan or whatever else, you had, a lot of people were switching to this because it was better. There was like a period of time where open source, actually this one edged it a bit ahead. And if you went with Cerebra, you would also get much faster speed. So this was a very interesting moment actually. And, oh yeah, service ServiceNow released another 1.5 to 15 B for enterprise open source. I don't know anyone. Used these ones, but, yeah, we covered this. Oh yeah. And then, Sona 4.5 dropped super fast. Everyone, everyone's still using it. haiku 4.5 got, we were very excited in the beginning. Now we aren't. but this, yeah, this was pretty, pretty insane. I really liked it. yeah. And, oh yeah, the DGX spark released, but to a lot of disappointment. Yeah. 'cause of, lacking memory bandwidth. So a lot of those promises did not get, realized for what it was useful for. And, yeah, a whole bunch of stuff on vision and video. I'll just quickly go through this. So we got World Labs, RTFM, this one, I did try. No, I tried the Alibaba one, but, models for 3D printing got very good now where people are actually, using them to 3D print stuff. So Point cloud stuff got really good. As in like production people are just making figurines or whatever, some businesses with it. LTX two, this one had pretty crazy stuff, but I think was this one open source? I don't fully remember.
Alex Volkov
Alex Volkov 1:23:23
I think they promised to open source it.
1:23:25
They promised to open source, but it wasn't really up with, yeah, they did follow up with open.
Nisten Tahiraj
Nisten Tahiraj 1:23:28
Open source.
1:23:29
Yeah, let's finish it. So cloud skills came out. A lot of people miss this, but, now it's picking up a lot. And, I think this is a much bigger deal even than most people realize. Because what it enables, it can be repl, it can be replaced by other tools as well. But I think what most people are still missing on this one. Is that it enables a hierarchical context and, yeah, it enables hierarchical context building and hierarchical reasoning. And this is the one thing that a lot have not taken advantage of. And I think it's going to become a much bigger deal later on in the year, next year. So I'm just putting this out there as an example. this might just be a really big deal and people are just starting to realize it. But also, amp implemented it. I think a whole bunch of other tools are implementing them as well.
Alex Volkov
Alex Volkov 1:24:19
Nice.
Nisten Tahiraj
Nisten Tahiraj 1:24:20
So yeah, that, that's picking up, we
1:24:22
got Cursor two and Composer. I spoke about that. Just
Alex Volkov
Alex Volkov 1:24:25
super quick.
1:24:25
One thing that I wanna call out is, we will definitely try, skills is like MCP level if not bigger, as far as many people are concerned. And, we will try to get you a deep dive into skills early next year in January so that like we'll will, ramp up on skills when they came out with play with 'em a little bit. But I think, it's now getting to a point where like it's a bigger deal and many other labs are understanding J GPT is implementing skills internally without telling us even. So we will definitely, go and do a deep dive on cloud skills, which is a bad name. 'cause if they're not used within cloud, they're not cloud skills, but, okay.
Nisten Tahiraj
Nisten Tahiraj 1:24:57
yeah.
1:24:57
Yeah. I think it's the start. I'm just calling it now. I just made up the term hierarchical reasoning. But that's what it, that's what it allows and, we've seen that early on now, so this might become pretty big. yeah. Even though at first it was just a prompt. Just add a prompt that's, it's just a bunch of MD files.
Alex Volkov
Alex Volkov 1:25:13
Yeah.
Nisten Tahiraj
Nisten Tahiraj 1:25:14
yeah.
1:25:14
And, we got also cognition, SWE So we started seeing at the end of this, that a lot of labs started training their own models. And, the open source one un until Opus came, but they were showing them and they were making them better. And, the very interesting thing that we have seen people, they just prefer them for a good, a good part of the job. So yeah. Extremely busy month. and, yeah, things just kept accelerating. Like we just got more and more. Yeah.
Alex Volkov
Alex Volkov 1:25:46
Yep.
1:25:46
great man. Nisten, thank you so much. let's see, who didn't we have for a while, to cover in November and I'll cover December. Yum. You wanna take, November?
Yam Peleg
Yam Peleg 1:25:56
Yeah, sure.
1:25:57
Okay. November. It's crazy, man. it's so close. In November was the, just right now?
Alex Volkov
Alex Volkov 1:26:02
Yes.
Yam Peleg
Yam Peleg 1:26:03
Okay.
1:26:04
Gemini three Pro, Google reclaims a number one spot on ARC Agi I two, with this, Gemini three pro deep think mode and, standard mode, roughly double the previous state of the art, which is crazy. everyone, I'm not gonna talk too much about Gemini three Pro because everyone, seems to know about it. 1 million Context window, was, for a short while, undeniably the best model you can use for code. also Google gave us anti-gravity id, which is, its own, this fork, that, has agent capabilities, ai, agent capabilities, embedded into it. With browser integration, into Google Chrome, that allows you to take screenshots for debugging and do all sorts of things right outta the box with Google Chrome, which is not easy at all, for all the other agents, that's very strong selling point. we also got Nano Banana Pro. that is the first model that is the not a toy you can send, allowing for 4K image resolution and, and pretty much perfect text rendering, for whatever text that you want in the image.
Alex Volkov (2)
Alex Volkov (2) 1:27:19
And it's really hard to break the text rendering of Nano
1:27:23
Banana pro, this is the sequence of event that we covered on the show, In the span of one week, in November. Grok 4.1 was released and for 24 hours was the best model in the world in lm Arena. the day before Google read Gemini 3.0, Google read Gemini three, three Pro. and this took all over the charts everywhere and absolutely massive jumps. the day after this OpenAI releaseGPT five Codex Max, which was apparently like their like super focused coding reasoning model that again, beat like SWE Bench verified on a few benchmarks and became like number one coding still. Gemini three report was the best, Gemini, like you said, released with Google anti-Gravity. and they released also deep think, although didn't launch it yet. And then, a day after GPT 5.1 Codex Max, Google came back with Nano Banana Pro. So all of these like big labs were fighting, the same week. This is like the, this is probably the be the most insane dense week in the eye releases, where day after day we got state-of-the-art, releases of the major models. we also saw a release of this Neo 20 K humanoid robot that supposedly is gonna be pre-launched in the next year, super quick, for $20,000. And this was not all because Meta released, their like a SR stuff, but also meta released SAM 3 and SAM3D also that week we thought the week was over. We covered it on ThursdAI, we thought this week was over. And then, and then Claude decided to finish off dyslexic. One spend of a week with Claude Opus 4.5, which we all say is just an absolute beast in coding, absolute missing, creative writing, and everything. so the one single most dense AI releases week in AI happened in November. This was. This week, made me feel that hey, maybe naming our podcast with a specific day maybe was not the best idea. Maybe we need to take time. But hey, also, we did a bunch of live shows this week as well, like Gemini three Pro Launch, and we did a live show as well. crazy. I wanna
Kwindla H Kramer
Kwindla H Kramer 1:29:10
throw out one more.
1:29:11
Yeah. 'cause like it got lost in the noise a little bit, but if you're interested in the future realtime video avatars, TVIs launched like the next generation of their realtime video avatars, which they're calling pals and they're amazing, like super responsive, very, like good video quality. They've got a bunch of custom built models, including a really good turn detection model. They trained in house. If you go to Santa do tavi.ai, you can talk to Santa and it's ah, yeah, That's true. We've been building like AI Santas since 2023. This is the first one that's okay, th this is amazing. And in 2026, we're gonna see realtime video avatars, just like we saw a realtime voice agents get all the way there.
Alex Volkov
Alex Volkov 1:29:51
No
Nisten Tahiraj
Nisten Tahiraj 1:29:51
So I think that is also a big key moment.
Kwindla H Kramer
Kwindla H Kramer 1:29:54
Yeah.
1:29:54
Really was
Nisten Tahiraj
Nisten Tahiraj 1:29:56
also just really quickly, the, a lot of people started having their
1:29:59
opus moments and a lot of people were like very skilled programmers that just used AI or so they just started using agents and, even some of the best devs I know that weren't that much into AI started having stuff that this was the first model that they could reliably offload work to. so a lot of people got the message, which we've been saying the last few months. It really hit home for most.
Alex Volkov
Alex Volkov 1:30:25
Yep.
1:30:25
Yeah. We wanna continue to finish up on open source. Yeah,
Yam Peleg
Yam Peleg 1:30:28
I just wanna highlight, that it was, it was like a week
1:30:31
and a half, two weeks that we got GPT 5.1 Grok 4.1 and fast. Claude Opus 4.5 GPT, 5.1 Codex and Gemini 3 Pro. That's like a week and a half. Like back to back. Every day we got something. Yeah. Everyone is fighting, fighting, fighting on coding models. Absolutely. Yeah. Open source, didn't slip. Absolutely not. this month we got Kimi K2 two thinking, MiniMax M2 two, intellect three, almo three, and Kimmi linear, which is, an efficient linear attention model. I can go into all the different benchmarks and which one is better than which, but, I don't think we have a lot of time for this. video and vision. I, we got obviously ML V two, real time interaction, interaction AI video, Sora character cameos, that pretty much, turns a pad and object into anime characters. Helio 2.3, which is the cinema grade video generation model. High
Alex Volkov
Alex Volkov 1:31:31
Lu.
1:31:32
That's how I heard the, that's I, minimax folks talk me to, to pronounce this. Hi Lu.
Yam Peleg
Yam Peleg 1:31:38
Hi Lu.
Alex Volkov
Alex Volkov 1:31:39
Yeah.
1:31:39
It means like 2.3. Yeah. 2.3.
Yam Peleg
Yam Peleg 1:31:42
Okay.
1:31:43
11 labs release script. V two real time. Let's give, let
Alex Volkov
Alex Volkov 1:31:48
give Qwen, the voice and audio stuff.
1:31:50
I think there's a bunch of that. Sure. She's here. Qwen. This, my notes said the Cartia came out on October, and you mentioned them on September, but yeah, we mentioned this here as well. the lingual Sr. Did you have a chance to look at this in, play around with the, with meta stuff?
Kwindla H Kramer
Kwindla H Kramer 1:32:04
So I've played with previous versions.
1:32:06
I did not play with the newest version 'cause there's now so much interesting competition in the a SR space. Right above that bullet point, you've got 11 labs Scribe, which is 11 labs new a SR model. Cartia has one. We talked about Deep Gram Flux. Yeah. Stax has a great model. Speech Medic has a great model, and Speech medic is really pushing on Diarization. they released a diarization oriented feature in their model in i, I think it was actually in this month as well. And that opens up like multi-speaker, voice agent use cases, which is something we're only just starting to scratch the surface of. And then the Minimax speech model's great. it's, very good in non-English languages. People really like it. a bunch of pipe cat people use it outside of English language use cases. model from the Chinese company, minimax. So on both the voice and the transcription side, there's like all of a sudden a huge amount of competition. And then Nvidia with the open source models coming in with Nvidia parakeet.
Alex Volkov
Alex Volkov 1:33:01
Yep, absolutely.
1:33:02
Just like bangers after bangers. but yeah, let's keep going. Yum. Finish up on tools of November, 2025.
Yam Peleg
Yam Peleg 1:33:09
Sure.
1:33:09
Yes, we, yeah, windsurf released code maps basically allowing you to generate, flow charts of your, entire code base for navigation, that you could, basically getting, giving you a high level, view of the code. that is more aligning to the way we write code today, which is AI assisted, you don't really care about the actual code itself, but like general obstruct part of the code. first IDE to go into this, in this idea in, terminal bench 2.0 release, which is a benchmark for terminal use by models, for agent decoding, evaluation. I'll add,
Alex Volkov
Alex Volkov 1:33:48
Harbor, which is the harness that they release together
1:33:51
with terminal bench, is a great thing because it unifies how, other agents are running on this benchmark. And other foundational labs started like using Harbor for Terminal Bench and then like it unified how they're reporting everybody else was like doing differently. So the same tools, et cetera. Harbor was like a really big deal together with terminal Bench for sure. Alright folks, this is almost, it. We're almost there we are at December. This is the, this 18. Okay? All of what I'm gonna cover happened in this 18 days. the notes were prepared before this week, so they don't have the bunch of releases from this week that we covered from the previous, first, one hour of the show. But December of 2025, GPT 5.2. This is the main launch. GPT 5.2, they came back with a vengeance after claiming after claims that Google took OpenAI's Throne. OpenAI came back with a vengeance with GPT 5.2 with 90% of Arc agi. I, one on the pro extra high configuration. I've, I ran three prompts through GPT, five Pro, extra high, whatever. And, I paid like $5. It's like ridiculously expensive, but it's like really hot. So GPT 5.2 and 5.2 Pro was released. This is OpenAI's answer to Gemini, three from December, just a week ago. 54 on Arc AGI 2 two a hundred percent a IE on competition math. The highlight of that release for me was the MRCR long context, which is, GPT 5.2 holds up to 128 K on the long contact thing, and the GPT 4 5 1 falls off after 128 K, lower host solution rates, three tiers, instant thinking and pro with probe being like extremely expensive if you use it via API. the strongest public available general purpose reasoning model for sure, GPT 5.2, so far, although I do prefer Opus on many tasks on top of GPT 5.2. There's something in, in the smell. I can't explain it. It's really hard. DeepSeek, finally finalized version 3.2, because previously was like experimental, et cetera. Now DeepSeek, 3.2 and 3.2 special gold medal reasoning from the whale. We keep waiting for R two. It doesn't seem like they're giving us R two maybe over the Christmas break. maybe, there's rumors about this with Nvidia stolen. Nevermind, there's rumors. We're not gonna do rumors, but it's very cheap. 28 cents per million to on open router. MCP was donated to the Linux Foundation. It's now no longerAnthropics MCP Linux Foundation, is gonna, own and maintain this alongside with agents MD and then also, I think Goose from blocks and all these companies now are participating in, in, in the spec. Google is supporting this OpenAI, supporting this. so this is now a Newent AI Foundation, A IF or a if you will. it's a great, a great thing for all of us, great for, interoperability. And I'm like, really huge. I'm really hoping that skills are gonna make it way over there as well, so everybody can implement skills as well. Mistral three in December came back to open source route and gave us Apache two license, Mistral three large 675 billion parameters, 41 active, and, mini straw three, which is their reasoning models. megastar was the reasoning models, but RAL three has reasoning enabled and vision enabled as well, but the vision is not the best there as well. we saw LLM training in space during, December. So the first, nano GPT, I think LDJ u said it was nano GPT was trained on H 100, rack in a satellite with a US flag in the corner. it was great. We talked about, it, it also ran Gemma, I think one of the cooler releases. one of the cooler updates to how we see copyright is Disney and OpenAI sign the deal, or in Disney invests $1 billion into OpenAI, and in return, OpenAI can use characters and IP from Disney inside SORA. And then Disney will highlight some of these characters and generations in, in Disney Plus, which is quite crazy. the license synthetic media, era has started. Mistral Dera was also there. And I think basically in addition to what we're getting from this week, I think, December has been a, a very awesome months in open source. Deepsea came back with Deeps Math V two, news folks, we haven't mentioned. Many of news releases came out with Nomos One, which is a math reasoning Putnam test winning, harness and models and Ian OCR. So on the big companies and lms, we cover GPT 5.2, Amazon release, NOVA two, and Nova two was significantly better, on enterprise, open router releases. State of AI report, we talked about this. and the glass slipper, phenomenon where people once the model fits. I think, Wolf, we mentioned this, once the model fits, they are, folks are staying with the model. Mr. Released a bunch of stuff and, I think that, oh, we had Gemini, TTS also released, Qwen. I dunno if you have a few words about the Geminis, updated TTS
Kwindla H Kramer
Kwindla H Kramer 1:38:22
Yeah, big Gemini releases, including big update to
1:38:25
the realtime model, the TTS model. Obviously Gemini three flash, all those are really useful for the world. I live in GPT five two's worse on all my benchmarks, like my personal benchmarks with reasoning off. interesting to see the Leapfrogging Nova two. So speech model from Amazon. Actually really good model competitive on our benchmarks with, with GPT realtime and Gemini Flash, which is like a big leap for AWS and the Otai, that open source French lab who I love so much. They announced a commercial spinoff and $70 million in funding to build voice agent stuff. The commercial spinoff is called Gradium and they launched a TTS and STT models. In on December 2nd, which I've only had a little bit of time to play with, but like the work from QAI is so good that I'm excited about what they're gonna do In 2026,
Alex Volkov
Alex Volkov 1:39:16
I did not connect Gradium and KyutAI at all, so thank you for that.
1:39:19
This same team,
Kwindla H Kramer
Kwindla H Kramer 1:39:20
same founding team, like the authors on those ke oai papers
1:39:23
or like the founders of Gradient.
Alex Volkov
Alex Volkov 1:39:25
Nice.
1:39:26
so we had also a huge month in vision and video. There's like a new model that topped VEO3 on some charts, and this was, runway 4.5, number one video leaderboards. that video leaderboard doesn't have audio in it, so this is not like really apples to apples comparison. V 3.1 is still probably the best overall for lip syncing and audio, but like for actual visuals, run point runway, 4.5 seems to be a little bit better. we had, p image and Z image both in image generation. There was a bunch of stuff. Folks, this list is vast. This is all from December, Isaac and ZI, releasing a bunch of stuff. voice audio we covered and tools. Cursor we added visual editing, which is like insane for many people who are doing front end. You can like actually like point and say, Hey, in the browser, this thing cursor fix this button, this specific thing. Google really stitch, MCP apps we talked about folks. whew, that's a busy December. quarterly summary. let's do it all of us together. big cognitive boost for the whole quarter. Gemini three took number one in December, but like a bunch of Opus is now like leading the charge as well. GPT 5.1, just like an insane week in, models competing with each other. All of them beating ai e a hundred percent at this point. All of them are giving just basic coding abilities. the best release, of this quarter is, I dunno, for me it's Opus. I really was happy withGPT with Gemini three, and then I drew this meme when Opus came out where the no longer, I longer play with Gemini three, Opus is my best friend. and then I found out Ton on the show that, Oprah 4.5 is free on Antigravity because Google wants those tokens. And so since then it's just op for me. agents that became products, oh, we didn't cover Atlas browser as a release at any point, but yeah, this was this quarter as well. open Air Launch Atlas, which is a agentic browser, which was one of the biggest thing for me. agent kit and Chat GPT apps today. By the way, there's also another breaking news release we didn't cover. but Chat GPT apps are now open for submission, so you can build an app and submit it and they have an app store coming up, like a full blown app so you can search and install some stuff. outside of just like partners, cursor, windsurf amp, all agenda coding racing via this quarter, just like all of them becoming products. Cursor launch 2.0. MCB is now universal protocol across everything. Everybody uses MCP, everybody supports, and now everybody understands MCP is not this like golden thing, because it blows up the context window. So code mode exists and now skills exist to replace some of the MCP. So it's very interesting how that, works. Google's huge comeback throughout this year I think is is a very big. I think we should mention, you guys wanna talk about, super quick, about Google's massive win this year. Wolfram, I see you're on mute. Go ahead.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:42:06
Yeah, I'm really a fan of using it now on my phone
1:42:09
because I got a pixel phone last, yeah, last December actually. And, the assistant, the Google assistant we had in the beginning it was, yeah, like Alexa or Siri, those AI assistant, they were just, yeah, keyword matching in a way and doing little things. But now, Google, there was a point, like I mentioned, maybe that was when Long joined, maybe he was called on board to do it, or he ins initiated it himself, that go Google, opened up much more and decided to make AI much more available and useful, not just for some, spec special users, but for everybody. And that is, now that people have access, I like opposite as well. It is also my second favorite model right after Gemini three Pro. But, I have, I have to use it on route via subscription, but a lot of people have access to, Google models on, yeah, like we said, Google search, home assistant, not homelessness, but Google Home Assistant and stuff. So it's much more available, many, much more people are using it. So the intelligence level has been increased and yeah, they are on all fronts with the video stuff, with the models. They have, TTS they also released, recently some new, Stuff in that regard. So Google has really gone all in into AI and they have three sources. They have the hardware, the inference, all of it. So yeah, you notice it now and don't bet against Google, I guess
Alex Volkov
Alex Volkov 1:43:34
they have the TPUs, they have the product surface, they have,
1:43:39
I don't know, my Gmail started writing replies for me where previous I needed like the perplexity email agent for this. Now they just show up in my Gmail. I just click send on a AI replies. At some point, like the emails were gonna keep sending themselves, people will have full conversation without your knowledge. Google is absolutely, leading, and winning this year, open
Wolfram Ravenwolf
Wolfram Ravenwolf 1:43:58
how important hardware is because Google has the
1:44:00
Android phones and yeah, you can put AI on them and put it everywhere. that is actually something Apple could have done, even if they haven't, hadn't done it. Them there if they could have bothered because yeah, you have the phone and then you have an AI device with you. That is the John Ivy stuff. We talked about what, kind of device may, might hope, may I develop? There has, yeah, there's nothing that came out of it so far, but, apple has a hardware and Google has a hardware, so both could do a lot with ai, but
Alex Volkov
Alex Volkov 1:44:31
so we're still, we're doing like a recap of the quarter itself.
1:44:34
Google is absolutely like doing just like absolutely wonderful stuff. LDJ, you have a comment on demand.
LDJ
LDJ 1:44:39
So on the comment of Deep Mind, I think it's interesting a lot of people
1:44:42
are worried about all the bureaucracy that was maybe built up in, in Deep Mind and Google over the years. but a thing that people point to sometimes, which may be a reason why they've been able to make such a comeback is people like Sergey Brynn coming back to Google and making a resurgence and going into founder mode and, a lot of stories and people reporting things like really stupid rules that the researchers were being held back by during organization, debates of where to spend compute or even just getting approval for things that he was able to just knock down and make more efficient. And I think they are going more into this startup era, luckily, where it's becoming more efficient and talent dense like philanthropic and OpenAI is. And I think they really have a chance again. Yeah.
Alex Volkov
Alex Volkov 1:45:26
Yep.
1:45:27
absolutely Google is crushing it. Qwen you have a short comment for us on, the, this quarter as well.
Kwindla H Kramer
Kwindla H Kramer 1:45:32
Yeah, this quarter my personal benchmarks got
1:45:35
saturated, which is super exciting and that was never true before. Like the benchmarks I care about, like I don't want reasoning turned on. I want really fast time to First Token I care about like 30 turn in longer conversations. I need tool calling models. Even this year, first half of this year, were bad at all that stuff, but now we have three models, Claude Sonnet, Gemini Three Flash, and GPT five one that saturate my personal benchmarks, which is awesome. Arguably the best model for what I care about was still GPT four oh, halfway through the year, maybe Gemini two five Flash if you could prompt it to do the tool calling. But now all I want is like faster time to first token from these Frontier Lab models. So
Alex Volkov
Alex Volkov 1:46:14
out of, GPT 5.1, Gemini three flash and Cloud solo
1:46:17
4.5, which is the fastest, time to first token that you can really see.
Kwindla H Kramer
Kwindla H Kramer 1:46:22
So I think it's gonna be Gemini three flash, but it's so new
1:46:25
they're still deploying TPUs for it.
Alex Volkov
Alex Volkov 1:46:28
Yeah, absolutely.
1:46:29
all righty folks. Hey, this has been the ThursdAI yearly recap. This is everything that happened this year. It was an absolute bonkers year of releases. Just that one week in fucking November just almost broke me as a person who's is full-time job is to cover and follow AI releases and make it easy for you as well. I will highlight a few things, as we are like ending up in the year. This is our last regularly scheduled program of the year. the yearly release will get released on the podcast. I probably after a week, but, I, it is been an incredible year for me on Thursday as well. And, I want a huge shout out. Like a huge thanks to all of the co-host and friends of the pod, experts like Qwen who come and tell us from their perspective. And just like everyone of you who tunes in and listens and participate and sends us links and tells us when I meet them face to face that hey, ThursdAI is the way they keep up to date. They like walk on a walk, they wash dishes, whatever, and they don't have time to keep up with the news. And it's getting harder and harder to keep up with the news and this is why we exist. I very happy, to have also this year stepped up with the production value for ThursdAI. The episodes are higher quality, the video's higher quality ma many folks here got mics, which was a great, and generally I think, this has been a great, ThursdAI, year as well. So with this, I just wanna, wish everyone a beautiful, incredible, very happy holiday season. I am pretty sure that the Chinese labs will not. Quiet down because they don't celebrate Christmas. So expect a bunch of other releases by the end of this year, maybe Deep six is gonna try to break the, the stock market again, Everything that we've talked about is publicly available. We open source, all of it, all the episodes, all of the infographics, all the prompts that I made the infographics with. So my prompt maker for the infographics is also open source on the Github Repo. I'm gonna add this in the show notes. if you wanna, achieve state of the art infographic performance with Opus and Break, nano Banana Pro, feel free to do so and let me know, everything that we talked about here will be, available on ThursdAI. So please don't forget to subscribe as a Christmas present. Please send one friend that is not up to date to AI on ai, a link to our show. And if you will, if you wanna give us a little Christmas present, holiday present, five stars everywhere we're listening to podcasts would be amazing. This is the best way to support the podcast Beside telling your friends about this. This is it folks. This is the end of the year for us. We don't know.
Nisten Tahiraj
Nisten Tahiraj 1:48:45
And we finished the year off still.
1:48:48
4.9 star rated on Apple Podcasts.
Alex Volkov
Alex Volkov 1:48:51
let's go.
1:48:52
Alright, folks, thank you so much, all of you. Happy holidays. enjoy the rest of the year and we'll see you guys in 2026. Let's go. Happy holidays everyone.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:49:03
Happy holidays.
1:49:03
Everyone had a good start in the new year.
Alex Volkov
Alex Volkov 1:49:05
this was amazing.
1:49:05
Thank you all for joining. Cheers. Bye-bye everyone. Bye-bye.