Episode Summary

This episode tracks a fast-moving week in AI: Anthropic makes Opus 4.6 1M context default, OpenAI ships GPT 5.4 Mini/Nano plus Codex subagents, and Nvidia’s GTC narrative around OpenClaw/NemoClaw signals agents going mainstream. The panel also covers fresh open-source releases from Mistral, Unsloth, and H Company, then shifts into tooling and real-world evals like WolfBench. Breaking updates include Cursor Composer 2 and fresh voice model momentum from Fish Audio and Grok TTS. It’s a dense operator-focused show with practical implications for anyone building agent workflows right now.

Hosts & Guests

Alex Volkov
Alex Volkov
Host · W&B / CoreWeave
@altryne
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator
@WolframRvnwlf
LDJ
LDJ
Weekly co-host of ThursdAI
@ldjconfirmed
Yam Peleg
Yam Peleg
Weekly co-host of ThursdAI
@Yampeleg
Nisten Tahiraj
Nisten Tahiraj
Weekly co-host of ThursdAI
@nisten

By The Numbers

Opus 4.6 context default
1M
Anthropic made 1M context the default in Claude Code.
MiniMax 2.7 SWE-bench Pro
56%
Performance number discussed during TL;DR.
Mistral Small 4 total params
119B
Reported as a sparse MoE with ~6B active params.
H Company Holotron 12B
8,900 tok/s
Claimed generation speed highlighted in the open-source segment.

🔥 Breaking During The Show

Cursor Composer 2 launch
Covered as a key breaking update in the coding tools segment.
Nvidia NemoClaw framing at GTC
Jensen highlighted enterprise agent infrastructure momentum around OpenClaw-style workflows.

⚡ Intro & What We're Using

Alex kicks off the show from sunny Colorado, a bit under the weather but energized by a packed week led by GTC. Wolfram and LDJ tease their highlights — Wolfram is excited about Anthropic making 1M context default for Opus 4.6 at the same price, while LDJ flags the surprise reveal that the stealth model topping OpenRouter turned out to be from Xiaomi. The panel debates whether longer context truly helps or if models still get 'stupider' past half the window, with Nisten noting Claude Code still auto-compacts at ~170K tokens unless you manually raise the limit.

  • Anthropic makes Opus 4.6 1M context the default in Claude Code at same price
  • Nisten warns auto-compaction still triggers at ~170K — you have to manually raise the limit
  • Yam notes context engineering still matters even at 1M tokens
  • Wolfram's rule of thumb: expect good performance up to about half the context window

📰 TL;DR

Alex runs through the week's headlines: Opus 4.6 going 1M default, OpenAI dropping GPT 5.4 Mini and Nano optimized for coding and computer use with Codex subagents, Minimax M2.7 as a self-evolving model hitting 56% on SWE-Bench Pro, and Mistral returning with Small 4 at 119B params. LDJ adds the Xiaomi MiMo reveal (1T parameter omni-modal model), Google's vibe coding overhaul in AI Studio, and the tools segment previews Jensen's 15-minute OpenClaw segment at GTC plus Codex subagents and Manus launching 'My Computer' desktop app.

  • GPT 5.4 Mini & Nano pair with Codex subagents for cheap parallel agent workflows
  • Minimax M2.7: first self-evolving model that helped build itself
  • Xiaomi MiMo revealed as the stealth #1 model on OpenRouter — 1T params
  • Jensen spent 15 minutes of his GTC keynote on OpenClaw

🔓 Open Source: Mistral Small 4

Mistral returns to open source with Small 4 — a 119B MoE with only 6B active per token, unifying their previous Pixtral (vision), Devstral (coding), and Mugstral (reasoning) lines into one Apache 2.0 model. LDJ notes it can fit on a single H100 when compressed. Wolfram's WolfBench results are sobering: Mistral Small 4 scored just 17% average on the OpenClaw agent benchmark, making it the worst performer in his list, though he cautions it's only 2-3 runs so far. Compared to Nemotron at similar size (~20%), they're roughly on par.

  • 119B total params, 6B active per token — Apache 2.0 license
  • Unifies Pixtral, Devstral, and Mugstral capabilities into one model
  • WolfBench: scored ~17% on OpenClaw agent tasks — worst in the benchmark so far
  • Roughly on par with Nemotron at similar size (~20%) within margin of error

🔓 Open Source: Unsloth Studio

Unsloth launches Studio, a web UI for local LLM training and inference claiming 2x speed and 70% less VRAM. Nisten argues this is bigger than people think — it could become the 'LM Studio of fine-tuning', making training accessible to anyone the way LM Studio democratized inference. LDJ clarifies that Unsloth's performance benefits existed before; the new thing is the no-code interface for beginners. Training on Colab Pro ($20/month) is confirmed working.

  • Web UI for local training + inference with 2x speed and 70% less VRAM
  • Nisten: 'This could be the LM Studio moment for fine-tuning'
  • Works on Google Colab Pro — trains models overnight for ~$20/month
  • Supports 500+ models including text, vision, audio, and embeddings

🔓 Open Source: H Company Holotron 12B

Brief mention of H Company's Holotron 12B — a hybrid SSM model for computer-use agents claiming 8,900 tokens/sec and jumping WebVoyager from 35% to 80.5%. The panel notes the trend of hybrid SSM architectures for longer context but doesn't deep-dive this one.

  • Hybrid SSM architecture targeting computer-use agent tasks
  • Claims 8,900 tokens/sec generation speed
  • WebVoyager benchmark jumps from 35.1% to 80.5%

🏢 OpenAI Acquires Astral (UV)

OpenAI acquires Astral (makers of uv, the Python package manager that changed the game) and the team joins Codex specifically. Yam draws a parallel: Anthropic bought Bun for TypeScript infrastructure powering Claude Code, OpenAI buys uv for Python infrastructure powering Codex — but notes they serve different roles (Bun is the engine for the agent itself, uv is tooling for the code the agent writes). Nisten is uneasy about two companies owning the core developer toolchains. Alex counters that it's all open source and forkable, like VS Code spawning Cursor.

  • Astral team joins Codex specifically — OpenAI's third acquisition this month
  • Parallel: Anthropic+Bun for TypeScript, OpenAI+Astral for Python
  • Nisten concerned about two companies controlling core dev toolchains
  • Yam: first time open source developers are getting acquired for these amounts — positive signal

🔓 Hugging Face State of Open Source

Hugging Face's Spring 2026 report shows China surpassing the US in LLM count for the first time, with 41% of all downloads. Alibaba's Qwen hit 1 billion+ total downloads across models — 1 million per day — overtaking Llama as the #1 downloaded model family. The platform now hosts 11 million users and 2 million+ models.

  • China surpasses US in number of LLMs — 41% of all HF downloads
  • Alibaba Qwen: 1 billion+ downloads, overtaking Llama as #1
  • 11M users, 2M+ models on Hugging Face

⚡ This Week's Buzz: W&B iOS App \+ Wolf Bench

Alex announces the most-requested W&B feature ever: a native iOS app for monitoring training runs with push notifications for alerts. Then Wolfram presents a deep WolfBench finding: raising Opus 4.6 to max thinking level actually made it significantly worse on agentic benchmarks (71% → 59% on TerminalBench), because it overthinks and runs out of time to act. GPT 5.4 behaves as expected — more thinking = better results. The panel also covers how GPT 5.4 went from 7% to 52% floor in OpenClaw after a WebSocket bug fix, validated by WolfBench's Weave integration traces.

  • W&B iOS app launched — native alerts for training run crashes
  • Opus 4.6 at max thinking drops from 71% to 59% — overthinking kills agent performance
  • GPT 5.4 benefits from more thinking as expected — extra high jumps to 71%/85%
  • GPT 5.4 in OpenClaw: 7% → 52% floor after WebSocket bug fix in v3.11

🔥 Breaking News: Cursor Composer 2

Cursor drops Composer 2, their first proprietary model that genuinely competes with frontier labs. LDJ breaks down the numbers: 61% on TerminalBench (beating Opus 4.6), $0.50/M input tokens — cheaper than GPT 5.4 Mini and 10x cheaper than Opus, with 300+ tokens/sec. The fast variant costs 3x more but maintains the same intelligence. Yam suspects it's a fine-tune rather than a from-scratch pretrain. The panel notes the new trend of 'fast mode' pricing — paying a premium for faster inference of the same model, flipping the old paradigm where smaller=cheaper.

  • 61% on TerminalBench — beats Opus 4.6, at $0.50/M input tokens
  • Cheaper than GPT 5.4 Mini, 10x cheaper than Opus 4.6
  • Fast variant: 3x the price for same intelligence at 300+ tok/sec
  • Yam: likely a fine-tune, not from-scratch — they'd be shouting about it otherwise

🛠️ GPT 5.4 Mini & Nano

OpenAI ships GPT 5.4 Mini ($0.75/M input) and Nano — smaller variants optimized for coding and computer use. The headline number: Mini hits 72% on OS World (verified), nearly matching the full 5.4's 75% and the human baseline of 72%. LDJ notes it beats Sonnet 4.6 on most of the five benchmarks shown. Wolfram explains the key use case: these aren't standalone agents but subagents — the main 5.4 orchestrates while spawning cheap Mini workers for parallel tasks like visual testing.

  • Mini: 72% on OS World verified — matches human baseline, near full 5.4's 75%
  • Beats Sonnet 4.5 on majority of benchmarks, nearly at 4.6 level
  • Designed for Codex subagent pattern: cheap parallel workers under a 5.4 orchestrator
  • 2x faster than previous GPT-5 Mini

🏢 Minimax M 2.7

Minimax drops M2.7, the 'self-evolving' model that ran 100+ autonomous RL optimization loops and wrote its own agent scaffolding — one engineer built it over four days with zero lines of human code. Hits 56% on SWE-Bench Pro, within 1 point of Opus 4.6's 57.3%. Yam tried it and calls it the best open-weights model (pending actual release). WolfBench shows it roughly matching Sonnet 4.6 on OpenClaw tasks. Not yet open source, but rumors suggest it will be.

  • 100+ autonomous RL loops — one engineer, four days, zero human code
  • 56% SWE-Bench Pro — within 1 point of Opus 4.6 (57.3%)
  • Yam: best open-weights model he's tried (if/when released)
  • WolfBench: roughly matches Sonnet 4.6 on OpenClaw agent tasks

🏢 Anthropic Opus 4.6 — 1M Context Default

Quick recap of the Opus 4.6 1M context announcement: what was previously experimental and expensive is now the default in Claude Code at the same price. Alex highlights MRCR benchmark performance — 93% at 256K, 76% at 1M. For agent users, this means less compaction (the 'mood killer') and longer uninterrupted sessions.

  • 1M context now default in Claude Code — same price as before
  • MRCR benchmark: 93% at 256K, 76% at 1M
  • Reduces compaction frequency for longer agent sessions

🤖 Jensen & Nvidia GTC: Open Claw / Nemo Claw

The showstopper: Jensen Huang spent 15 minutes of his GTC keynote on OpenClaw, calling it the most important open source release since Linux and declaring 'every company needs an OpenClaw strategy.' NVIDIA released NemoClaw — a hardened enterprise reference implementation with a privacy router and policy engine. Wolfram calls it his highlight of the week for what it signals. Yam pushes back: OpenClaw caught fire precisely because it's unsecured ('dangerously skip permissions is the core'). Alex draws the parallel to early internet: no HTTPS, no firewalls, Windows XP getting hacked in 13 seconds — we'll evolve.

  • Jensen: 'Every company needs an OpenClaw strategy' — compared it to mobile and cloud strategies
  • NemoClaw: enterprise-hardened OpenClaw with privacy router and policy engine
  • Yam's counterpoint: OpenClaw succeeded because it's dangerously permissive
  • Alex's analogy: we're at the 'no HTTPS' era of agents — security will come
  • Ryan Carson reported using 1 billion tokens in a single day with subagents

🏢 Nvidia GR LPX & Groq Chip

LDJ covers NVIDIA's GTC hardware announcements: the Groq 3 chip (skipping gen 2 entirely) integrated into NVIDIA's Ruben NVL72 servers via the new GR LPX system. The numbers: 3x improvement in tokens-per-watt efficiency at baseline, up to 30x at higher throughput, and 1000+ tok/sec on a 2T-param model with 400K context — performance levels the current Blackwell generation can't reach at any price.

  • Groq 3 chip announced — gen 2 was never publicly seen
  • Ruben + LPX: 3x tokens-per-watt at baseline, 30x at higher throughput
  • 1000+ tok/sec on 2T-param frontier model with 400K context
  • Blackwell can't match this performance at any efficiency point

🔊 Fish Audio & Grok Voice

Alex demos Fish Audio S2, an open-source TTS model with inline emotion control via free-text bracket tags (gasp, laughter, long pause — though sneeze doesn't work). The killer demo: Alex built an OpenClaw skill using Fish Audio that lets his 5-year-old Sean talk to 'Rocky' from Project Hail Mary, with the voice clone nailing the character. Grok also launched a TTS API with 5 voices and WebSocket streaming, cheaper than ElevenLabs. Wolfram confirms Fish Audio is fully open source.

  • Fish Audio S2: open-source TTS with free-text emotion tags in brackets
  • Live demo: Sean (age 5) talking to a Rocky voice agent built with OpenClaw + Fish Audio
  • Grok TTS API launched — 5 voices, WebSocket streaming, cheaper than ElevenLabs
  • Fish Audio is fully open source — 'ElevenLabs V3 for free' per Wolfram

⚡ Wrap-Up

Alex speed-runs the topics they didn't fully cover: Codex subagents, Manus 'My Computer' desktop app (now Meta-owned), DLSS 5 generative AI filters from GTC, and the Xiaomi MiMo omni models revealed as the OpenRouter stealth champion. He signs off a bit under the weather but notes the show's new social media clip strategy on YouTube.

  • Codex subagents, Manus My Computer, DLSS 5, Xiaomi MiMo left as mentions
  • Three years of ThursdAI — newsletter, podcast, and YouTube clips strategy
TL;DR of all topics covered:

  • Hosts and Guests

  • Big CO LLMs + APIs

    • Anthropic makes Opus 4.6 with 1M context the default claude code - at the same price (X)

    • OpenAI drops GPT-5.4 mini and nano, optimized for coding, computer use, and subagents at a fraction of flagship cost (X, Announcement, Announcement)

    • Xiaomi - Omni modal and language only 1T parameters - MiMo (X)

    • Google AI Studio gets a full-stack vibe coding overhaul with Antigravity agent, Firebase integration, and multiplayer support (X, Blog, Announcement)

    • MiniMax M2.7: the first self-evolving model that helped build itself, hitting 56.22% on SWE-Bench Pro (X, X, Announcement)

    • Cursor launches Composer 2, their first proprietary frontier coding model beating Opus 4.6 at a fraction of the cost (X, Blog)

  • Open Source LLMs

    • Mamba-3 drops with three SSM-centric innovations: trapezoidal discretization, complex-valued states, and MIMO formulation for inference-first linear models (X, Arxiv, GitHub)

    • H Company releases Holotron-12B, an open-source hybrid SSM model for computer-use agents that hits 8.9k tokens/sec and jumps WebVoyager from 35.1% to 80.5% (X, X, HF, Blog)

    • Hugging Face’s Spring 2026 State of Open Source report reveals 11M users, 2M models, and China dominating 41% of downloads as open source becomes a geopolitical chess board (X, Blog, X, X)

    • Unsloth launches open-source Studio web UI for local LLM training and inference with 2x speed and 70% less VRAM (X, Announcement, GitHub)

    • Astral (Ruff, uv, ty) joins OpenAI’s Codex team (announcement , blog , Charlie Marsh)

    • Mistral Small 4: 119B MoE with 128 experts, only 6B active per token, unifying reasoning, multimodal, and coding under Apache 2.0 (X, Blog, HF)

  • Tools & Agentic Engineering

    • NVIDIA GTC: Jensen Huang declares “Every company needs an OpenClaw strategy,” announces NemoClaw enterprise platform (X, TechCrunch, NemoClaw)

    • OpenAI ships subagents for Codex, enabling parallel specialized agents with custom TOML configs (X, Announcement, GitHub)

    • Manus (now Meta) launches ‘My Computer’ desktop app, bringing its AI agent from the cloud onto your local machine for macOS and Windows (X, Blog)

  • This weeks Buzz

    • Weights & Biases launches iOS mobile app for monitoring AI training runs with crash alerts and live metrics (X, Announcement)

    • GPT 5.4 went from worst to best on WolfBenchAI after an OpenClaw config fix exposed a max_new_tokens bottleneck (X, X, X)

  • Voice & Audio

    • xAI launches Grok Text-to-Speech API with 5 voices, expressive controls, and WebSocket streaming (X, Announcement)

  • AI Art & Diffusion & 3D

    • NVIDIA DLSS 5 is making waves with a new generative AI filter (Blog)