Episode Summary
The ThursdAI crew reunites for their legendary annual tradition β a quarter-by-quarter, month-by-month review of every major AI release of 2025. Alex is joined by all five co-hosts plus Kwindla Hultman Kramer (Daily.co) to relive a year that started with DeepSeek crashing NVIDIA's stock and ended with Gemini 3 reclaiming the #1 benchmark spot. Across 110 minutes, they cover 50+ releases: the rise of reasoning models, vibe coding going mainstream, Chinese labs dominating open source, Claude Code launching the CLI agent era, and the jaw-dropping moment someone trained an LLM in space. This is the definitive record of AI's most acceleration-packed year ever.
In This Episode
- ποΈ Intro & Team's Biggest Releases of the Year
- π Q1: January β DeepSeek Shakes the World
- π§ Q1: February β Reasoning Mania & The Birth of Vibe Coding
- π Q1: March β MCP Becomes the Universal Protocol
- π Q2: AprilβJune β VEO3, Claude Opus 4, and the Thinking Machines
- π¨π³ Q3: July β Chinese Labs Dominate, AI Browsers Emerge
- πΌοΈ Q3: August β Flux 3, Agent Standards, and Three Years of AI
- π» Q3: September β GPT-5 Codex, Vision Breakthroughs, DeepSeek V3.1
- π₯ Q4: October β Sora 2, GLM 4.6, Claude Skills
- β‘ Q4: November β Gemini 3 Reclaims #1, GPT 5.1, Grok 4.1, and Banger Week
- π Q4: December β Google's Incredible Month & Year Wrap-up
Hosts & Guests
By The Numbers
π₯ Breaking During The Show
ποΈ Intro & Team's Biggest Releases of the Year
Alex welcomes the full ThursdAI crew for the 52nd and final episode of 2025. Before diving into the quarter-by-quarter recap, each co-host and guest shares their personal pick for the single most impactful AI release of the year β Claude Code, native image generation, Opus 4.5, and paradigm shifts in image AI all get nominations.
- Yam: Claude Code / CLI agents as the year's defining release
- Wolfram: paradigm shift in image generation
- Ryan: Opus 4.5 β 700+ days of daily coding with LLMs
- Kwindla: Claude Code proving 'the harness matters as much as the model'
π Q1: January β DeepSeek Shakes the World
The earthquake that shattered assumptions about who leads AI. DeepSeek R1 dropped January 23rd, crashed NVIDIA stock 17% ($560B loss β the largest single-company monetary loss in history), matched OpenAI's o1 at 50x cheaper pricing, and made even grandmothers aware of Chinese AI. OpenAI Operator launched browser-based agents, Project Stargate committed $500B to AI infrastructure, and Kokoro TTS went viral.
- DeepSeek R1: crashed NVIDIA 17%, matched o1, allegedly cost $5.5M to train
- OpenAI Operator: first agentic ChatGPT with browser control
- Project Stargate: $500B infrastructure β the Manhattan Project for AI
- NVIDIA Project Digits: $3,000 desktop running 200B parameter models
- Kokoro TTS: 82M param model hit #1 TTS Arena, Apache 2, runs in browser
- MiniMax-01: 4M context window from Hailuo
π§ Q1: February β Reasoning Mania & The Birth of Vibe Coding
The month that redefined how we work with AI. OpenAI Deep Research scored 26.6% on Humanity's Last Exam (vs 10% for o1/R1). Andrej Karpathy coined 'vibe coding' in early February, and it immediately reshaped the entire developer ecosystem. Claude Code launched as an internal Anthropic tool and began the CLI agent revolution. OpenAI's naming chaos continued with two separate model lines.
- OpenAI Deep Research: 26.6% HLE score β agentic research breakthrough
- Andrej Karpathy coins 'vibe coding' β term less than a year old, already everywhere
- Claude Code launches β built internally at Anthropic, becomes defining release
- OpenAI naming chaos: two parallel model lines cause mass confusion
π Q1: March β MCP Becomes the Universal Protocol
March saw MCP (Model Context Protocol) win the integration wars and become the de facto standard for connecting AI agents to tools. OpenAI released two voice models derived from GPT Realtime, Qwen launched speech-to-speech capabilities, and Gemini 2.5 briefly claimed the top benchmark spot. Cursor sales exploded on the back of Claude 3.7 and vibe coding mania.
- MCP wins as universal standard for agent-tool integration
- OpenAI's new voice models: GPT Realtime speech-to-speech derivatives
- Qwen launches speech-to-speech model with internal emotion handling
- Gemini 2.5 takes #1 benchmark briefly
- Cursor sales explode β Claude 3.7 + vibe coding = perfect storm
- OpenAI no longer the undisputed leader β inflection point
π Q2: AprilβJune β VEO3, Claude Opus 4, and the Thinking Machines
Q2 was the quarter voice agents went mainstream beyond the AI bubble. April brought ChatGPT memory and agent-to-agent protocols. May delivered GPT-4o native image generation and Ghibli-mania. June saw Claude Opus 4 drop (Ryan's pick as best model ever), VEO3 with native audio stun everyone, and Thinking Machines Lab (Mira Murti + an avalanche of top researchers) launch. Daily.co's smart turn detection shipped during this period.
- ChatGPT memory and GPT-4o native image generation
- VEO3: native audio generation β crossed the uncanny valley for video
- Claude Opus 4: Ryan's pick as best model ever after 700 days of daily AI coding
- Thinking Machines Lab: Mira Murti + top OpenAI researchers form new lab
- Kwindla (Daily.co) ships smart turn detection for voice agents
- Claude Max 24/7 agent β briefly available, then nerfed, spoiled everyone
- Google IO 2025: the quarter voice agents escaped the AI bubble
π¨π³ Q3: July β Chinese Labs Dominate, AI Browsers Emerge
July was peak Chinese lab dominance: Kimi K2 got serious recognition, Qwen 3 Coder posted insane scores, GLM 4.5 ran on Cerebras fast enough to win hackathons, and Tencent HO One entered the scene. The first serious AI-native browsers started shipping. NVIDIA's 'fridge company making AI' joke from years prior delivered actual frontier research.
- Kimi K2: Chinese model that earned mainstream recognition
- Qwen 3 Coder: insane benchmark scores for the coding crown
- GLM 4.5: ran on Cerebras fast enough to win competitive hackathons
- Tencent HO One and Huawei enter the open weights race
- First serious AI-native browsers start shipping
- NVIDIA's AI research division delivers frontier-level results
πΌοΈ Q3: August β Flux 3, Agent Standards, and Three Years of AI
August marked three years since Stable Diffusion went public. Wolfram reflects on the distance traveled. Flux 3 dropped and immediately became the image generation gold standard. Agent-to-agent communication standards started consolidating. KAIP brought multi-agent Claude Code orchestration. The agent infrastructure layer was quietly solidifying.
- Flux 3: new gold standard for image generation
- Three years post-Stable Diffusion: Wolfram reflects on the distance traveled
- KAIP: orchestrate multiple Claude Code agents in parallel
- Agent-to-agent standards: A2A and related protocols coalescing
- August: surprisingly dense with infrastructure-layer releases
π» Q3: September β GPT-5 Codex, Vision Breakthroughs, DeepSeek V3.1
September was 'infinite money glitch' month. GPT-5 Codex dropped as OpenAI's specialized coding model and caused the stock to move significantly. DeepSeek V3.1 Terminus resurfaced just as the team was barely keeping up weekly. Vision and video saw major breakthroughs. RevA emerged as a four-in-one image creation and editing platform that Alex still uses daily.
- GPT-5 Codex: OpenAI's specialized coding model β 'infinite money glitch' on stock price
- DeepSeek V3.1 Terminus: dropped just as everyone was overwhelmed
- RevA: 4-in-1 image creation/editing platform
- September vision and video: major model releases
- The pace became almost too much β Nisten missed a week and fell behind
π₯ Q4: October β Sora 2, GLM 4.6, Claude Skills
October opened Q4 with Sora 2 democratizing video generation and spawning an endless wave of memes. GLM 4.6 quietly became what many businesses still use today. Claude Skills launched β largely missed at release but now picking up fast, with Nisten calling it 'MCP-level if not bigger.' Cursor 2 and Composer shipped. Cognition's SWE-bench agents began showing that labs were training models specifically on agentic coding benchmarks.
- Sora 2: video generation democratized, memes still circulating
- GLM 4.6: quietly became a go-to for many businesses
- Claude Skills: missed at launch, now gaining steam β Nisten says 'MCP-level or bigger'
- Cursor 2 + Composer: IDE agents level up
- Cognition SWE-bench: labs begin training specifically for agentic coding
β‘ Q4: November β Gemini 3 Reclaims #1, GPT 5.1, Grok 4.1, and Banger Week
November delivered one of the most stacked weeks of the year: GPT 5.1, Grok 4.1, and Claude Opus 4.5 all dropped within a week and a half. Gemini 3 Pro Deep Think mode reclaimed #1 on ARC-AGI 2. ElevenLabs Script V2 Real-Time shipped. MiniMax Hailuo (LLM 2.3) dropped. Windsurf released Code Maps. Daily.co's personal benchmarks hit saturation. The acceleration was undeniable.
- Gemini 3 Pro: reclaims #1 on ARC-AGI 2 with Deep Think mode
- GPT 5.1 + Grok 4.1 + Claude Opus 4.5: banger week β one and a half weeks of top releases
- ElevenLabs Script V2 Real-Time: voice synthesis milestone
- MiniMax Hailuo LLM 2.3: another strong Chinese open release
- Windsurf Code Maps: generate flowcharts of entire codebases
- Kwindla: 'my personal benchmarks got saturated β that was never true before'
π Q4: December β Google's Incredible Month & Year Wrap-up
December was Google's month: Gemini 3 Flash, big realtime model updates, a Gemini TTS model, and a cascade of releases that cemented Google's comeback narrative. Kwindla connects Gradium and KyutAI (same founding team). The crew reflects on what 2025 meant β from LLMs in space to AGI benchmarks saturating β and sends everyone off for the holidays with a 4.9-star Apple Podcasts rating intact.
- Google December: Gemini 3 Flash, realtime model updates, TTS model
- KyutAI β Gradium connection: same founding team (revealed by Kwindla)
- LLMs trained in space β the year's wildest headline
- ThursdAI ends 2025 with 4.9-star Apple Podcasts rating
- 52 episodes, 12 months, relentless acceleration documented
π The Big Picture β 2025: The Year AI Agents Became Real
Looking back at 51 episodes and 12 months of relentless AI progress, several mega-themes emerged:
- π§ Reasoning Models Changed Everything β From DeepSeek R1 in January to GPT-5.2 in December, reasoning became the defining capability. Models now think for hours, call tools mid-thought, and score perfect on math olympiads.
- π€ 2025 Was Actually the Year of Agents β We said it in January, and it came true. Claude Code launched the CLI revolution, MCP became the universal protocol, and by December we had ChatGPT Apps, Atlas browser, and AgentKit.
- π¨π³ Chinese Labs Dominated Open Source β DeepSeek, Qwen, MiniMax, Kimi, ByteDance β despite chip restrictions, Chinese labs released the best open weights models all year. Qwen 3, Kimi K2, DeepSeek V3.2 were defining releases.
- π¬ We Crossed the Uncanny Valley β VEO3's native audio, Suno V5's indistinguishable music, Sora 2's social platform β 2025 was the year AI-generated media became indistinguishable from human-created content.
- π° The Investment Scale Became Absurd β $500B Stargate, $1.4T compute obligations, $183B valuations, $100β300M researcher packages, LLMs training in space. The numbers stopped making sense.
- π Google Made a Comeback β After years of "catching up," Google delivered Gemini 3, Antigravity, Nano Banana Pro, VEO3, and took the #1 spot (briefly). Don't bet against Google.