ThursdAI · February 12, 2026

📆 Open source just pulled up to Opus 4.6 — at 1/20th the price

Plus: Gemini 3 Deep Think hits 84% on ARC-AGI, OpenAI's new 1000 t/s coding model, and the video model that shattered reality.

By Alex Volkov

88 min

YouTube Spotify Apple Podcasts Substack

Episode Summary

Two open-source labs sent representatives to the show in the same episode — Lou from Z.AI debuted GLM-5 (744B params, open-weights coding crown) and Olive Song from MiniMax revealed M-2.5 (80.2% SWE-Bench Verified with only 10B active params at 1/20th the cost of Opus). Then Google dropped Gemini 3 Deep Think with an 84% ARC-AGI 2 score — the biggest single-week jump ever on that benchmark — and OpenAI answered with GPT 5.3 Codex Spark on Cerebras for real-time coding speeds. Oh, and ByteDance's Seedance 2 shattered video generation reality with 15-second multi-shot clips that feel like stepping into the future.

In This Episode

📰 Intro & Highlights of the Week
📰 TLDR - This Week's AI News Rundown
🔓 Interview: Lou from ZAI on GLM-5
🔓 Panel Discussion: GLM-5 Reactions
🔥 BREAKING: Minimax M-2.5 Drops Live
🔓 Interview: Olive Song from Minimax on M-2.5
🔓 Panel Discussion: Minimax & Open Source Momentum
💰 This Week's Buzz - W&B Inference
🏢 XAI Restructuring & SpaceX Acquisition
📰 Matt Schumer's Viral AI Article & The Acceleration
🔥 BREAKING: Gemini 3 Deep Think - 84% on ARC-AGI-2
🔥 BREAKING: GPT 5.3 Codex Spark on Cerebras
🎥 Seedance 2 - ByteDance's Mind-Bending Video Model
🤖 Agent Psychosis & The Sleep Problem
🎥 Bytedance SeeDance 2.0 - shattering reality
📰 Wrap-Up & Goodbye

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Lou

Z.AI — Head of DevRel

@louszbd

Olive Song

MiniMax AI — Senior Researcher

@olive_jy_song

Ryan Carson

AI educator & founder

@ryancarson

Nisten Tahiraj

AI operator & builder

AI builder & founder

Nous Research

Weekly co-host, AI model evaluator

@WolframRvnwlf

By The Numbers

ARC-AGI 2

84%

Gemini 3 Deep Think — biggest single jump on this benchmark ever, up from Opus 4.6's 68%

SWE-Bench Verified

80.2%

MiniMax M-2.5 with only 10B active parameters, approaching Opus 4.6 levels

GLM-5 Parameters

744B

Z.AI's open-weights model with 40B active params, trained on Huawei chips

Cost per task

15¢

MiniMax M-2.5 vs Opus 4.6 at ~$2.50 — 57% win rate at fraction of the price

Training tokens

28.5T

GLM-5 trained on 28.5 trillion tokens, scaled up massively from previous version

Codex Spark speed

100 tps

GPT 5.3 Codex Spark on Cerebras — real-time coding inference

🔥 Breaking During The Show

MiniMax M-2.5 — 80.2% SWE-Bench Verified

Dropped 30 minutes before the show. 10B active parameters competing with Opus 4.6 at a fraction of the cost. Olive Song joined live to discuss.

Gemini 3 Deep Think — 84% ARC-AGI 2

Dropped during the show. Biggest single-week jump in ARC-AGI history, from Opus 4.6's 68% to 84%. Also 48.4% on Humanities Last Exam without tools.

GPT 5.3 Codex Spark on Cerebras

Ryan spotted it on X during the show. OpenAI's first model on Cerebras hardware, designed for real-time coding at extreme speeds.

📰 Intro & Highlights of the Week

Alex opens with the biggest open-source week in memory — GLM-5 and MiniMax 2.5 both dropped with representatives joining live. The panel shares their highlights: Wolfram picks GLM-5, Alex picks Seedance 2, and Yam is funding Anthropic's snack budget.

Both Z.AI and MiniMax sent reps to the show for live interviews
Open source competing directly with Opus 4.6 on benchmarks
Seedance 2 from ByteDance breaking everyone's brains

Wolfram Ravenwolf

"What a week. So much cool stuff from China."

📰 TLDR - This Week's AI News Rundown

Alex runs through all the week's releases: GLM-5 and MiniMax 2.5 competing with Opus, XAI restructuring after SpaceX acquisition, Anthropic's sabotage risk report, OpenAI's deep research upgrade, and ByteDance's Seedance 2 shattering video generation.

GLM-5: 744B params, open-weights coding crown
MiniMax 2.5: 80.2% SWE-Bench with 10B active
Seedance 2: 15-second multi-shot video with sound

🔓 Interview: Lou from ZAI on GLM-5

Lou from Z.AI joins at 1 AM Shanghai time to discuss GLM-5's architecture, the new SLIM reinforcement learning framework, and adoption of DeepSeek's sparse attention mechanism. She summarizes the model in four words: bigger, faster, better, and cheaper.

SLIM: new asynchronous RL framework for post-training
DeepSeek sparse attention for reduced deployment cost
GLM-5 trained on Huawei chips, not NVIDIA

Lou

"If I had to sum it up in four words, I would say bigger, faster, better, and cheaper."

🔓 Panel Discussion: GLM-5 Reactions

The panel reacts to GLM-5 — Nisten notes it uses DeepSeek architecture, Ryan highlights the dream of running open-source models locally for Open Claw, and Yam emphasizes it's a model that can run general computer use at close to free.

Trained on Huawei chips, restricted GPU serving capacity
50% Humanities Last Exam, beating Opus 4.5 and Gemini 3 Pro
34% lowest hallucination rate on AAA benchmark

Yam Peleg

"It's a model that can run general computer use at close to being free. Like that. That's crazy."

Ryan Carson

"I love seeing the competition because what we want is a really good open source model that can rival an Opus or a Codex for people to run Open Claw locally."

🔥 BREAKING: Minimax M-2.5 Drops Live

Breaking news during the show — MiniMax releases M-2.5 just 30 minutes before airtime. Alex brings Olive Song from MiniMax to announce the model live.

80.2% SWE-Bench Verified
10B active parameters, 200B total
Dropped live during the show

🔓 Interview: Olive Song from Minimax on M-2.5

Olive Song discusses their Forge RL framework, how they trained efficiency into the model (less tool calling, less thinking tokens), and reveals the model is actually still training — they cut a checkpoint to release because developers were asking.

Forge: decoupled RL framework training diverse tasks without interference
Model optimized for end-to-end task time, not just benchmark scores
Still training — cut a checkpoint for early release

Olive Song

"A funny story about this release is that as we are talking right now, the model is actually still training and then the accuracy is still scaling."

🔓 Panel Discussion: Minimax & Open Source Momentum

The panel discusses the jaw-dropping pace of open-source progress. Nisten notes benchmarking concerns but acknowledges the model's real utility for multi-agent orchestration. LDJ highlights the cost-per-intelligence advantage.

MiniMax 2.5 beats Gemini 3 Pro on SWE-Bench
Can run on a Mac Studio M3 Ultra at 80+ tps
Open source now one week behind frontier on benchmarks

Nisten Tahiraj

"You can buy something for $8,000 like an M3 Ultra, and I think it does like very good speeds, like over 80 tokens per second."

💰 This Week's Buzz - W&B Inference

Alex announces day-zero GLM-5 support on W&B Inference service powered by CoreWeave, with MiniMax 2.5 and Kimi K2.5 coming soon. Free credits available for testing.

GLM-5 live on W&B Inference day zero
Free credits for testing via @wandb on X

🏢 XAI Restructuring & SpaceX Acquisition

Multiple XAI co-founders departed after SpaceX acquired XAI. The company restructured into four buckets: LLM/Voice, Coding, and Macro Hard (data centers). Grok 4.2 is nowhere to be found, and they're talking about putting GPUs in space.

300,000 GPU Memphis training cluster — largest in the world
Jimmy Ba (co-author of Adam) left, said recursive self-improvement coming this year
Restructured into 4 divisions including Macro Hard

Alex Volkov

"I use Grok for research, specifically X research. Grok itself has API access to X better, faster than you."

📰 Matt Schumer's Viral AI Article & The Acceleration

The panel discusses Matt Schumer's viral article (74M views) about the speed of AI progress, the gap between AI-native people and everyone else, and Ryan shares a real-world case study of end-to-end AI engineering.

74 million views on Matt Schumer's article
Feb 5 models made everything before feel like a different era
Harness Engineering case study on Codex in production

Ryan Carson

"People are beginning to actually, from end to end having zero humans involved in the writing or reading or reviewing or shipping of code. It's starting to happen."

🔥 BREAKING: Gemini 3 Deep Think - 84% on ARC-AGI-2

Breaking news mid-show: Google drops Gemini 3 Deep Think with 84% on ARC-AGI 2 (up from Opus 4.6's 68% just one week prior) and 48.4% on Humanities Last Exam without tools. The biggest single jump in ARC-AGI history.

84% ARC-AGI 2 — up from 68% (Opus 4.6) one week ago
48.4% Humanities Last Exam without tools
Biggest single-week jump in benchmark history

Yam Peleg

"Google drops Gemini three deep thinking, significant upgrade to deep thinking. Basically state-of-the-art on ARC AGI 2, to the best of my knowledge."

Alex Volkov

"The jump in ARC-AGI. What the fuck just happened?"

🔥 BREAKING: GPT 5.3 Codex Spark on Cerebras

Another breaking news: OpenAI releases GPT 5.3 Codex Spark, a smaller version of Codex designed for real-time coding, in partnership with Cerebras for insane inference speeds. Available to ChatGPT Pro users.

First OpenAI model on Cerebras hardware
Designed for real-time coding at 100+ tokens/sec
Available in Codex app, CLI, and IDE extension

Ryan Carson

"GPT 5.3 Codex Spark, what? Last two minutes ago. So I'll read a little bit from that."

🎥 Seedance 2 - ByteDance's Mind-Bending Video Model

Alex demos ByteDance's Seedance 2, a video generation model that accepts 9 images + 3 videos + 3 audio clips as reference. The multi-shot consistency, native audio, and physics are at a level that makes the original Sora feel like a different era.

15-second high-quality multi-shot with native stereo audio
9 images + 3 videos + 3 audio clips as input references
45-second internal test mode available

Alex Volkov

"These videos are generated with Seedance 2. It feels like the jump from when we were before Sora and then we saw Sora for the first time."

🤖 Agent Psychosis & The Sleep Problem

The panel gets real about the mental health impact of running AI agents 24/7. Multiple panelists report sleep disruption, FOMO about underutilizing their agents, and the paradox that tools meant to reduce work are creating more anxiety.

Ryan wakes up at 2 AM regularly worried about agents
Wolfram worries about shutting down agents for security
The primitives for managing agent teams don't exist yet

Ryan Carson

"No one's running agents 24/7 and actually doing productive work. They may be running small teams of agents to build real apps, but we're just not there yet."

Wolfram Ravenwolf

"Every moment an agent is not running, you think you are losing time. You know you are wasting time because it could be doing something for you."

🎥 Bytedance SeeDance 2.0 - shattering reality

Continued deeper dive into Seedance 2 demos — showing multi-shot character consistency, anime style generation, and native audio with environmental sounds. Available on BytePlus platform.

Character consistency across multi-shot sequences
Anime and realistic style modes
Available on BytePlus platform

📰 Wrap-Up & Goodbye

Alex recaps an insane show: two open-source lab interviews, two breaking news drops (Gemini 3 Deep Think and GPT 5.3 Codex Spark), and Seedance 2 demos. Over 2000 listeners tuned in.

2000+ live listeners
4 breaking events in one episode
Coming up on 3 years of ThursdAI

TL;DR of all topics covered:

Hosts and Guests
- Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
- Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldjconfirmed) @ryancarson
- Lou from Z.AI (@louszbd)
- Olive Song - Lead RL at Minimax @olive_jy_song
Open Source LLMs
- Z.ai launches GLM-5: 744B parameter MoE model achieving #1 open-source ranking for agentic coding with 77.8% SWE-bench Verified (X, HF, Wandb)
- MiniMax M2.5 drops official benchmarks showing SOTA coding performance at 20x cheaper than competitors (X)
Big CO LLMs + APIs
- XAI cofounders quit/let go after X restructuring (X, TechCrunch)
- Anthropic releases Claude Opus 4.6 sabotage risk report, preemptively meeting ASL-4 safety standards for autonomous AI R&D (X, Blog)
- OpenAI upgrades Deep Research to GPT-5.2 with app integrations, site-specific searches, and real-time collaboration (X, Blog)
- Gemini 3 Deep Think SOTA on Arc AGI 2, HLE (X)
- OpenAI releases GPT 5.3 Codex spark, backed by Cerebras with over 1000tok/sec (X)
This weeks Buzz
- W&B Inference launch of Kimi K2.5 and GLM 5 🔥 (X, Inference)
- Get $50 of credits to our inference service HERE (X)
Vision & Video
- ByteDance Seedance 2.0 launches with unified multimodal audio-video generation supporting 9 images, 3 videos, 3 audio clips simultaneously (X, Blog, Announcement)
AI Art & Diffusion & 3D
- Alibaba launches Qwen-Image-2.0: A 7B parameter image generation model with native 2K resolution and superior text rendering (X, Announcement)
Tools & Links
- Entire raises $60M seed to build open-source developer platform for AI agent workflows with first OSS release ‘Checkpoints’ (X, GitHub, Blog)
- Chrome 146 introduces WebMCP: A native browser API enabling AI agents to directly interact with web services (X)
- RyanCarson AntFarm - Agent Coordination (X)
- Steve Yegge’s “The AI Vampire” (X)
- Matt Shumer’s “something big is happening” (X)

Alex Volkov 0:30

Welcome, everyone.

0:31

Welcome to ThursdAI four February 12th. My name is Alex Volkov. I'm AI Evangelist with Weights, & Biases from CoreWeave. ThursdAI is brought to you by Weights, & Biases, and I am really excited about today's show. really excited about today's show. Uh, we have folks tuning in on YouTube. and we have our own Wolfram, Raven Wolf over here, and Ryan Carson. What's up guys? How you guys doing?

Ryan Carson 0:54

Good to see everybody.

Alex Volkov 0:56

Good to see you guys.