ThursdAI · November 13, 2025

GPT‑5.1’s New Brain, Grok’s 2M Context, Omnilingual ASR, and a Terminal UI That Sparks Joy

From Weights & Biases, GPT5.1 reactions, Open Source recaps and 2 interviews with Dima from WandB and Paul from 11labs about Scribe + a few breaking news on this packed AI news week!

By Alex Volkov

70 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of November 13, 2025?

This week on ThursdAI, the crew dives into a whirlwind of AI breakthroughs: GPT-5.1 finally lands with a warmer, more personable voice, Grok 4 Fast stuns with a 2 million token context window, and Baidu’s Ernie 4.5 VL shakes up visual reasoning with just 3B active parameters. Meta drops Lingual ASR, supporting a jaw-dropping 1600+ languages, while 11 Labs launches Scribe V2 Real Time for blazing-fast, multilingual transcription. Plus, Dima from W&B demos LEET, a terminal UI that sparks joy for ML practitioners everywhere—it’s a jam-packed episode full of live demos, open source surprises, and breaking news you won’t want to miss.

Introduction and Show Overview
Open Source AI Highlights
Terminal Bench Deep Dive
Baidu’s Ernie 4.5 VL and Visual Reasoning
Breaking News: Age Company Releases Hello Two
11 Labs’ Scribe V2 Real-Time Launch

Episode Summary

In This Episode

📰 Introduction and Show Overview
🔓 Open Source AI Highlights
🛠️ Terminal Bench Deep Dive
🎨 Baidu’s Ernie 4.5 VL and Visual Reasoning
🔥 Breaking News: Age Company Releases Hello Two
🔊 11 Labs’ Scribe V2 Real-Time Launch
⚡ This Week’s Buzz: W&B LEET Demo
🔊 Meta’s Lingual SR Release
🏢 Big Companies and APIs

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Paul Asjes

Developer Experience Engineer · ElevenLabs

@paul_asjes

Dima Duev

Staff Software Engineer · Weights & Biases

Yam Peleg

AI builder & founder

@Yampeleg

Wolfram Ravenwolf

AI model evaluator

@WolframRvnwlf

LDJ

Nous Research

@ldjconfirmed

By The Numbers

Context Window

Grok 4 Fast’s new max context length

Active Parameters

Baidu Ernie 4.5 VL vision/reasoning model punches above its weight

Languages Supported

1600+

Meta Lingual ASR’s speech recognition coverage

Latency

150ms

11 Labs Scribe V2 Real Time transcription speed

Terminal Bench v2 Top Score

50%

Warp agent’s performance on the new, harder agentic benchmark

Languages (Scribe)

90+

11 Labs Scribe V2 Real Time language support

🔥 Breaking During The Show

Age Company Releases Hello Two

Dropped live during the show: Age Company open sources Hello Two, a new multimodal agent family fine-tuned on Qwen 3 VL, with SOTA results on computer use and web navigation benchmarks. Apache 2.0.

📰 Introduction and Show Overview

Alex sets the stage for a jam-packed episode, previewing major open source releases, big lab news, and two live interviews. The team teases highlights like Terminal Bench v2, Baidu Ernie 4.5 VL, Grok 4 Fast’s massive context window, and demos from 11 Labs and Weights & Biases.

Preview of GPT-5.1, Grok 4 Fast, Baidu Ernie 4.5 VL
Live demos from 11 Labs and W&B LEET

🔓 Open Source AI Highlights

The crew kicks off the open source segment, covering community-driven models and benchmarks in a week where Chinese labs and open weights models continue to push the frontier.

Terminal Bench v2 sets new bar for agentic evals
Baidu, Qwen, and Meta all drop open source releases

🛠️ Terminal Bench Deep Dive

A deep exploration of Terminal Bench v2, the new gold standard for evaluating coding agents in realistic, terminal-based tasks. The team discusses the benchmark’s difficulty, community contributions, and why a 50% top score is more meaningful than chasing fractions on saturated benchmarks.

Terminal Bench v2: 89 hard tasks, 1000 Discord contributors
Warp agent hits 50%, Codex CLI close behind
Top score of 50% is ideal for meaningful comparison (cf. MMLU at 99%)

Yam Peleg

"It’s much more meaningful to optimize the benchmark where everyone is 50% on, than to chase four basis points out of a score that’s already 90-something."

Wolfram Ravenwolf

"If you change one variable, it cannot be compared anymore. You need all these details."

🎨 Baidu’s Ernie 4.5 VL and Visual Reasoning

Baidu’s Ernie 4.5 VL drops as a 3B parameter visual reasoning model, claiming to rival much larger models like GPT-5 High on vision tasks. The team tests it live, scrutinizes the benchmarks, and discusses the GSPO training method.

Ernie 4.5 VL: 3B active params, Apache 2.0, open weights
Innovative image zooming, spatial grounding, and reasoning
GSPO training from Qwen team enables strong small-model performance

Alex Volkov

"This is a 3 billion parameter model that competes with GPT-5 High on visual stuff, folks."

Yam Peleg

"How do you get 3 billion parameters to be this good on vision? Vision specifically is, I don’t know, it’s weird."

🔥 Breaking News: Age Company Releases Hello Two

Live on air, the team reacts to the surprise open source release of Hello Two by Age Company—a new multimodal agent family fine-tuned on Qwen 3 VL, boasting SOTA results on computer use and web navigation tasks. Apache 2.0, four model sizes.

Hello Two: open source, Apache 2.0
Strong OS World G scores, built on Qwen 3 VL
4B, 8B, and 30B model variants

Alex Volkov

"We just got a brand new release from the age company—the next generation multimodal family built for grounding, navigation, reasoning across web, desktop, and mobile."

🔊 11 Labs’ Scribe V2 Real-Time Launch

Paul Asjes from ElevenLabs joins to demo Scribe V2 Real Time, a lightning-fast, multilingual speech-to-text model with 150ms latency and 90+ language support. The team sees live transcription and seamless language switching, and Paul explains how Scribe outpaces Whisper on speed and accuracy.

Scribe V2 Real Time: 150ms latency, 90+ languages
Live demo: seamless language auto-switching mid-stream
Context-aware transcription handles code, initialisms, and technical terms

Paul Asjes

"It’s super fast—150 milliseconds. You talk into a mic and it’ll just transcribe exactly what you’re saying in over 90 different languages."

Alex Volkov

"This is like nearly real time. So this is able to get streamed into an LLM as I’m speaking."

⚡ This Week’s Buzz: W&B LEET Demo

Dima Duev from W&B’s SDK team demos LEET—a terminal-native dashboard for tracking ML runs even fully offline. The UI brings real-time metrics, beautiful ASCII art, and interactive exploration to the terminal, sparking joy for ML engineers everywhere.

LEET: Lightweight Experiment Exploration Tool
Works offline—perfect for air-gapped HPC clusters
Interactive metrics, system stats, zoomable charts in terminal

Dima Duev

"The main purpose of this tool is to spark joy right here in your terminal."

🔊 Meta’s Lingual SR Release

Meta (Facebook) releases Lingual ASR, a speech recognition model supporting over 1600 languages—including 500 never before served by any ASR system. The team breaks down the technical leap, open source release under Apache 2.0, and the massive curated dataset behind it.

Lingual ASR: 1600+ languages, 500+ new to ASR
Character error rate <10% for 78 languages
Apache 2.0, 500k+ rows of transcribed audio

Alex Volkov

"They support 1600 plus languages, 500 of them never before served in any type of ASR."

Yam Peleg

"The one B model is this phenomenal state of the art on its own—nearly a drop-in replacement, built on hugging face. You could just use it."

🏢 Big Companies and APIs

The panel covers the week’s major releases from big labs: GPT-5.1’s new warmer voice, Grok 4 Fast’s 2 million token context window, and Gemini Live voice updates. The crew discusses what these signal for the frontier model race.

GPT-5.1: warmer voice, personality upgrades
Grok 4 Fast: 2M token context window
Gemini Live: updated voice capabilities

Alex Volkov

"Grok four Fast apparently now has 2 million context window tokens, which is crazy."

TL;DR and Show Notes

Hosts and Guests
- Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
- Co-Hosts - @WolframRvnwlf, @yampeleg, @ldjconfirmed
- Guest: Dima Duev - SDK team Wandb
- Guest: Paul Asjes - Eleven Labs (@paul_asjes)
Open Source LLMs
- Terminal-Bench 2.0 and Harbor launch (X, Blog, Docs, Announcement)
- Baidu releases ERNIE-4.5-VL-28B-A3B-Thinking (X, HF, GitHub, Blog, Platform)
- Project AELLA (OSSAS): 100K LLM-generated paper summaries (X, HF)
- WeiboAI’s VibeThinker-1.5B (X, HF, Arxiv, Announcement)
- Code Arena — live, agentic coding evaluations (X, Blog, Announcement)
Big CO LLMs + APIs
- Grok 4 Fast, Grok Imagine and Nano Banana v1/v2 (X, X, X, X)
- OpenAI launches GPT-5.1 (X, X)
This weeks Buzz
- W&B LEET — an open-source Terminal UI (TUI) to monitor runs (X, Blog)
Voice & Audio
- ElevenLabs launches Scribe v2 Realtime (X, Blog, Docs)
- Meta releases Omnilingual ASR for 1,600+ languages (X, Blog, Paper, HF Dataset, HF Demo, GitHub)
- Gemini Live conversational upgrade (X)
AI Art & Diffusion & 3D
- Qwen Image Edit + Multi‑Angle LoRA for camera control (X, HF, Fal)
- NVIDIA releases ChronoEdit-14B Upscaler LoRA (X, HF, Docs)

Speaker 0:00

folks?