Episode Summary

This week on ThursdAI, the crew dives into a whirlwind of AI breakthroughs: GPT-5.1 finally lands with a warmer, more personable voice, Grok 4 Fast stuns with a 2 million token context window, and Baidu’s Ernie 4.5 VL shakes up visual reasoning with just 3B active parameters. Meta drops Lingual ASR, supporting a jaw-dropping 1600+ languages, while 11 Labs launches Scribe V2 Real Time for blazing-fast, multilingual transcription. Plus, Dima from W&B demos LEET, a terminal UI that sparks joy for ML practitioners everywhere—it’s a jam-packed episode full of live demos, open source surprises, and breaking news you won’t want to miss.

Hosts & Guests

Alex Volkov
Alex Volkov
Host · W&B / CoreWeave
@altryne
Paul Asjes
Paul Asjes
Developer Experience Engineer · ElevenLabs
@paul_asjes
Dima Duev
Dima Duev
Staff Software Engineer · Weights & Biases
@
Yam Peleg
Yam Peleg
AI builder & founder
@Yampeleg
LDJ
LDJ
Nous Research
@ldjconfirmed

By The Numbers

Context Window
2M
Grok 4 Fast’s new max context length
Active Parameters
3B
Baidu Ernie 4.5 VL vision/reasoning model punches above its weight
Languages Supported
1600+
Meta Lingual ASR’s speech recognition coverage
Latency
150ms
11 Labs Scribe V2 Real Time transcription speed
Terminal Bench v2 Top Score
50%
Warp agent’s performance on the new, harder agentic benchmark
Languages (Scribe)
90+
11 Labs Scribe V2 Real Time language support

🔥 Breaking During The Show

Age Company Releases Hello Two
Dropped live during the show: Age Company open sources Hello Two, a new multimodal agent family fine-tuned on Qwen 3 VL, with SOTA results on computer use and web navigation benchmarks. Apache 2.0.

📰 Introduction and Show Overview

Alex sets the stage for a jam-packed episode, previewing major open source releases, big lab news, and two live interviews. The team teases highlights like Terminal Bench v2, Baidu Ernie 4.5 VL, Grok 4 Fast’s massive context window, and demos from 11 Labs and Weights & Biases.

  • Preview of GPT-5.1, Grok 4 Fast, Baidu Ernie 4.5 VL
  • Live demos from 11 Labs and W&B LEET

🔓 Open Source AI Highlights

The crew kicks off the open source segment, covering community-driven models and benchmarks in a week where Chinese labs and open weights models continue to push the frontier.

  • Terminal Bench v2 sets new bar for agentic evals
  • Baidu, Qwen, and Meta all drop open source releases

🛠️ Terminal Bench Deep Dive

A deep exploration of Terminal Bench v2, the new gold standard for evaluating coding agents in realistic, terminal-based tasks. The team discusses the benchmark’s difficulty, community contributions, and why a 50% top score is more meaningful than chasing fractions on saturated benchmarks.

  • Terminal Bench v2: 89 hard tasks, 1000 Discord contributors
  • Warp agent hits 50%, Codex CLI close behind
  • Top score of 50% is ideal for meaningful comparison (cf. MMLU at 99%)
Yam Peleg
Yam Peleg
"It’s much more meaningful to optimize the benchmark where everyone is 50% on, than to chase four basis points out of a score that’s already 90-something."
Wolfram Ravenwolf
Wolfram Ravenwolf
"If you change one variable, it cannot be compared anymore. You need all these details."

🎨 Baidu’s Ernie 4.5 VL and Visual Reasoning

Baidu’s Ernie 4.5 VL drops as a 3B parameter visual reasoning model, claiming to rival much larger models like GPT-5 High on vision tasks. The team tests it live, scrutinizes the benchmarks, and discusses the GSPO training method.

  • Ernie 4.5 VL: 3B active params, Apache 2.0, open weights
  • Innovative image zooming, spatial grounding, and reasoning
  • GSPO training from Qwen team enables strong small-model performance
Alex Volkov
Alex Volkov
"This is a 3 billion parameter model that competes with GPT-5 High on visual stuff, folks."
Yam Peleg
Yam Peleg
"How do you get 3 billion parameters to be this good on vision? Vision specifically is, I don’t know, it’s weird."

🔥 Breaking News: Age Company Releases Hello Two

Live on air, the team reacts to the surprise open source release of Hello Two by Age Company—a new multimodal agent family fine-tuned on Qwen 3 VL, boasting SOTA results on computer use and web navigation tasks. Apache 2.0, four model sizes.

  • Hello Two: open source, Apache 2.0
  • Strong OS World G scores, built on Qwen 3 VL
  • 4B, 8B, and 30B model variants
Alex Volkov
Alex Volkov
"We just got a brand new release from the age company—the next generation multimodal family built for grounding, navigation, reasoning across web, desktop, and mobile."

🔊 11 Labs’ Scribe V2 Real-Time Launch

Paul Asjes from ElevenLabs joins to demo Scribe V2 Real Time, a lightning-fast, multilingual speech-to-text model with 150ms latency and 90+ language support. The team sees live transcription and seamless language switching, and Paul explains how Scribe outpaces Whisper on speed and accuracy.

  • Scribe V2 Real Time: 150ms latency, 90+ languages
  • Live demo: seamless language auto-switching mid-stream
  • Context-aware transcription handles code, initialisms, and technical terms
Paul Asjes
Paul Asjes
"It’s super fast—150 milliseconds. You talk into a mic and it’ll just transcribe exactly what you’re saying in over 90 different languages."
Alex Volkov
Alex Volkov
"This is like nearly real time. So this is able to get streamed into an LLM as I’m speaking."

⚡ This Week’s Buzz: W&B LEET Demo

Dima Duev from W&B’s SDK team demos LEET—a terminal-native dashboard for tracking ML runs even fully offline. The UI brings real-time metrics, beautiful ASCII art, and interactive exploration to the terminal, sparking joy for ML engineers everywhere.

  • LEET: Lightweight Experiment Exploration Tool
  • Works offline—perfect for air-gapped HPC clusters
  • Interactive metrics, system stats, zoomable charts in terminal
Dima Duev
Dima Duev
"The main purpose of this tool is to spark joy right here in your terminal."

🔊 Meta’s Lingual SR Release

Meta (Facebook) releases Lingual ASR, a speech recognition model supporting over 1600 languages—including 500 never before served by any ASR system. The team breaks down the technical leap, open source release under Apache 2.0, and the massive curated dataset behind it.

  • Lingual ASR: 1600+ languages, 500+ new to ASR
  • Character error rate <10% for 78 languages
  • Apache 2.0, 500k+ rows of transcribed audio
Alex Volkov
Alex Volkov
"They support 1600 plus languages, 500 of them never before served in any type of ASR."
Yam Peleg
Yam Peleg
"The one B model is this phenomenal state of the art on its own—nearly a drop-in replacement, built on hugging face. You could just use it."

🏢 Big Companies and APIs

The panel covers the week’s major releases from big labs: GPT-5.1’s new warmer voice, Grok 4 Fast’s 2 million token context window, and Gemini Live voice updates. The crew discusses what these signal for the frontier model race.

  • GPT-5.1: warmer voice, personality upgrades
  • Grok 4 Fast: 2M token context window
  • Gemini Live: updated voice capabilities
Alex Volkov
Alex Volkov
"Grok four Fast apparently now has 2 million context window tokens, which is crazy."
TL;DR and Show Notes
  • Hosts and Guests

  • Open Source LLMs

  • Big CO LLMs + APIs

    • Grok 4 Fast, Grok Imagine and Nano Banana v1/v2 (X, X, X, X)

    • OpenAI launches GPT-5.1 (X, X)

  • This weeks Buzz

    • W&B LEET — an open-source Terminal UI (TUI) to monitor runs (X, Blog)

  • Voice & Audio

  • AI Art & Diffusion & 3D

    • Qwen Image Edit + Multi‑Angle LoRA for camera control (X, HF, Fal)

    • NVIDIA releases ChronoEdit-14B Upscaler LoRA (X, HF, Docs)