ThursdAI · Thursday, June 25, 2026 · Episode recap

GLM 5.2 has its DeepSeek moment

Open source just had its second DeepSeek moment.

A chill summer week where open source did not chill. GLM 5.2 went live on CoreWeave, climbed to #3 on WolfBench, and started beating closed frontier models at agentic coding and web design. Plus Sakana Fugu, Claude in Slack, OpenAI’s first custom chip, and Sean Grove on the 10,000-agent-hour company.

744B GLM-5.2 params
#3 model ever on WolfBench
61% WolfBench solid base
$1.39/$4.40 per M tokens
1M context window
10,000 agent hrs/person/day
Replay
ThursdAI · June 25, 2026
thursdai.news/yt • x.com/altryne
Press play to load the full episode
01

💥 GLM 5.2 has its DeepSeek moment

Headline Z.ai Open source

The whole show orbited one model. GLM 5.2 — released last week, but this week the one everyone is actually running, benchmarking and comparing to closed frontier models. The surprise isn’t just coding: it’s web design and UI taste. Peter says Arena data puts 5.2 well above 5.1 and shockingly strong on web dev. Alex showed a ThursdAI page GLM built and called it a genuine first for open source. Wolfram’s caveat: still weak in German, so it’s a workhorse, not necessarily your main conversational model.

  • Arena data puts GLM 5.2 above 5.1, with surprising strength on web/front-end work.
  • Alex shows a custom GLM-built ThursdAI page — the first open model genuinely good at design.
  • Unsloth shipped GGUF quants so you can run a 1M-context GLM locally.
“This is the first model in open source that is really good at web design and front end design.” Alex Volkov
“It was a GLM week, all right? Everybody is realizing this is a DeepSeek moment.” Nisten Tahiraj
744Bparameters (MoE)
1Mcontext window
MITlicense, open weights
02

🐝 This Week’s Buzz: GLM on CoreWeave + WolfBench

This Week’s Buzz CoreWeave

GLM 5.2 went live on CoreWeave Serverless Inference — just bring your W&B key (or hit it via OpenRouter). The team biased the deployment toward speed over the full million-token context. Then Wolfram ran WolfBench: GLM 5.2 is the third best model he’s ever benchmarked, exceeding Opus 4.7, with the strongest “solid base” (consistently-solved tasks) in the run — at a fraction of the cost of GPT 5.5 or Opus.

  • Served at $1.39/M input and $4.40/M output — cheaper than Opus with caching.
  • WolfBench solid base of 61% on max thinking — GPT-5.5-level reliability.
  • Under $200 to run the benchmark vs ~$500 for GPT 5.5 and ~$400 for Opus.
“It is the third best model that I have benchmarked here, and it even exceeds Opus 4.7.” Wolfram Ravenwolf
“This is the DeepSeek moment realized.” Alex Volkov
$1.39/ 1M input
$4.40/ 1M output
#3model on WolfBench
61%solid base (max thinking)
03

🐡 Sakana Fugu and the orchestration layer

Orchestration Sakana AI

Sakana AI launched Fugu: one API endpoint that hides “seven raccoons in a trench coat” — a trained router that dispatches your task to publicly accessible models in Thinker / Worker / Verifier roles, then fuses the result. Nisten clocked the pool as Opus, Codex and Gemini. The panel frames it as the next paradigm after thinking models and MoE: coordinated model teams. (Caveat from chat: people burned through their $20 tier on a single agentic prompt.)

  • Routes to public frontier models — no private Fable/Mythos in the pool.
  • Backed by two ICLR papers: Trinity and the Conductor.
  • Echoes OpenRouter and Arena’s prompt classifiers — routing as a product.
“I think this is one of the new things, to raise the intelligence even higher by combining different models to get one result.” Wolfram Ravenwolf
95.5GPQA Diamond
93.2LiveCodeBench
73.7SWE-Bench Pro
04

🤖 Sean Grove on Linzumi and agent fleets

Guest Linzumi

Sean Grove — ex-OpenAI (Model Spec, deliberative alignment), now on his third company — joined to explain Linzumi: a shared chat-and-orchestration environment where humans and fleets of coding agents work in the same threads, continuously compiling the company’s intent into a living specification. His thesis: stop reading every line of generated code; read the failures against the properties you care about, like property-based testing instead of hand-checking unit tests. The target is 10,000 agent hours per person per day.

  • Linzumi captures ambient chats, calls and coding jobs into a compiled spec.
  • “Live six months in the future” — build tools for where agents will be, not where they are.
  • Sean: without agentic company-building, he’d have retired rather than start a fourth.
“If you are involved in reading the output or making every micro decision, you will never be able to scale to that stage.” Sean Grove
“You need a ladder of evidence that allows you to build up trust in the system and know when to trust it and when not to.” Sean Grove
10,000agent hrs / person / day
1.2M+views, AI.Engineer talk
05

Claude joins Slack, OpenAI builds silicon

Big Co Anthropic OpenAI

The big-company stack kept moving. Claude Tag turns Slack into a persistent surface for an ambient AI teammate — shared channel context, proactive follow-up, coding, analysis, incident support and enterprise governance. Nisten’s take: bigger than it first looks, because you keep the context, personality and safety scaffolding instead of “firing” Claude each session. Meanwhile OpenAI Jalapeno — its first custom inference ASIC with Broadcom — signals a full-stack future, with a claimed nine-month design-to-tape-out (engineers in Nisten’s chat suspect work began years earlier).

  • Claude Tag: an ambient Slack teammate with shared context and follow-up.
  • Jalapeno targets ~50% lower inference cost; Nvidia keeps the training market.
  • Every Broadcom dollar is a dollar not spent on Nvidia — like Google’s TPUs and Meta’s silicon.
“OpenAI builds its own chip. Jalapeno is a custom inference ASIC designed with Broadcom.” Alex Volkov
9 modesign → tape-out (claimed)
~50%inference cost cut goal
65%Anthropic team code via Claude
06

Quick hits: open weights, OCR & tiny models

Rapid fire

A loaded week beyond GLM. Krea 2 open-weights (12B image model, raw + turbo) brings back image diversity the “competent but collapsed” frontier models lost. Baidu Unlimited-OCR (3B, constant KV cache, 40+ pages in one pass, ~93.9% on OmniDoc Bench) and Mistral OCR 4 push document AI. Liquid LFM 2.5 230M is billed as the world’s smallest agentic LLM — smaller than a node_modules folder, fast enough for a Raspberry Pi or a toaster. Plus OpenAI Daybreak (security tooling), the Seedance 2.5 teaser (4K, 30s, IP licensing, early July), and the Aside agentic browser.

  • Krea 2 — open-weights 12B image model with out-of-distribution artistic range.
  • Baidu Unlimited-OCR & Mistral OCR 4 — cheap, fast document intelligence.
  • Liquid LFM 2.5 230M — tiny on-device agentic model for edge & CPU.
Releases this week

The products and launches that actually shipped this week. GLM 5.2 dropped last week — we covered it again this week, but it is not a release here.

Show notes

A quieter summer week with one loud conclusion: GLM 5.2 made open source feel frontier-adjacent in practical agentic work, web design, and cost-performance — open source’s second DeepSeek moment. The panel also covered Sakana Fugu, OpenAI Jalapeno, Claude Tag, Daybreak, Krea 2, OCR releases, Unsloth quantization, Liquid AI tiny models, and Sean Grove joined to explain Linzumi and company-scale agent orchestration.

GLM 5.2 was the real center of gravity: CoreWeave serving, WolfBench results, web design quality, Unsloth quants, and the broader open-source DeepSeek moment. Also covered: Sakana Fugu, Claude Tag, OpenAI Jalapeno, Daybreak, Krea 2, OCR releases, Aside, Seedance 2.5, and Sean Grove on Linzumi.

All the links, open in one go