Episode Summary

Thanksgiving comes every Thursday, and ThursdAI's third annual Thanksgiving special delivered a feast of AI releases to be genuinely thankful for. Anthropic finally brought back Opus 4.5 β€” and it's reclaiming the coding crown with 80.9% SWE-bench Verified at a third the old price. Open source had its own feast: Prime Intellect's INTELLECT-3 (106B MoE), DeepSeek Math V2, Microsoft's Fara-7B, and BFL's FLUX.2 all dropped in one week. Plus, Ido Salomon and Liad Yosef returned to discuss MCP-UI becoming the official 'MCP Apps' standard adopted by both Anthropic and OpenAI β€” the foundation of what Alex calls 'the agentic web.'

Hosts & Guests

Alex Volkov
Alex Volkov
Host Β· W&B / CoreWeave
@altryne
Ido Salomon
Ido Salomon
Monday.com (GitMCP) β€” AI Lead / Co-creator
@idosal1
Yam Peleg
Yam Peleg
Weekly co-host of ThursdAI
@Yampeleg
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator
@WolframRvnwlf
Nisten Tahiraj
Nisten Tahiraj
Weekly co-host of ThursdAI
@nisten

By The Numbers

SWE-bench Verified
80.9%
Claude Opus 4.5 β€” reclaims #1 coding LLM at 1/3 the cost of old Opus
Input tokens (Opus 4.5)
$5/M
Down from old Opus pricing β€” massive value upgrade for agentic workflows
INTELLECT-3 params
106B
Prime Intellect's MoE model (12B active), 90% on AIME 2024/2025
HunyuanOCR params
1B
Tencent's tiny OCR model beats 72B models with 860 on OCRBench
WebVoyager (Fara-7B)
73.5%
Microsoft's 7B on-device computer use agent beats OpenAI preview
DeepSeek Math V2 params
685B
Open-weights, Apache 2.0, IMO gold-level math reasoning

πŸ”₯ Breaking During The Show

Claude Opus 4.5 β€” Coding Crown Reclaimed
Anthropic dropped Opus 4.5 this week: 80.9% SWE-bench Verified, new Effort parameter, Tool Search, and Programmatic Tool Calling β€” at 1/3 the old Opus price.
MCP-UI Standardized as MCP Apps by Anthropic + OpenAI
The MCP-UI open standard is now officially 'MCP Apps,' jointly adopted by Anthropic and OpenAI β€” agents can now render interactive HTML UIs inside chat.

πŸ”“ Open Source LLMs

A Thanksgiving feast of open-source drops: Prime Intellect's INTELLECT-3 (106B MoE) shows a small lab can train frontier-scale models, DeepSeek surfaces a 685B math model with IMO gold performance, and Microsoft's Fara-7B brings on-device computer use to 7B parameters. Z-Image Turbo from Tongyi makes image generation sub-second, and FLUX.2 from BFL enables multi-reference image editing at 32B scale.

  • INTELLECT-3: 106B MoE, 90% AIME 2024/2025, fully open-sourced training stack
  • DeepSeek Math V2: 685B Apache-2.0, IMO gold-level β€” first open-weights math champion
  • Fara-7B: Microsoft's 7B on-device computer use agent, 73.5% WebVoyager
  • Z-Image Turbo: sub-second image generation from Tongyi/Alibaba
  • FLUX.2: 32B multi-reference image editing from Black Forest Labs
Yam Peleg
Yam Peleg
"It's an incredibly powerful model, open source β€” large, expensive, open source, heavily, powerful."
Wolfram Ravenwolf
Wolfram Ravenwolf
"This amazing actually with the variables you can use, because I've been doing a lot of image editing and you prompt it."

⚑ This Week's Buzz β€” W&B Serverless LoRA

Alex previews the brand-new Serverless LoRA Inference launch from Weights & Biases on CoreWeave: upload a LoRA adapter to W&B Artifacts, serve it instantly on top of any base model with no cold starts and no dedicated GPU. Alex demos a 'Mocking SpongeBob' LoRA he trained in 25 minutes.

  • W&B + CoreWeave: upload LoRA adapters, serve instantly via API
  • No cold starts, no dedicated GPU instances needed
  • Demo: SaRcAsTiC SpongeBob LoRA on Qwen 2.5 base
Alex Volkov
Alex Volkov
"Hey folks, welcome to this week's Buzz β€” the news from this week from Weights & Biases!"

πŸ€– Interview: MCP Apps & the Agentic Web

Ido Salomon (and Liad Yosef off-camera) return to the show to discuss MCP-UI's transformation into 'MCP Apps' β€” now an official standard jointly adopted by Anthropic and OpenAI. The pair explain how agents can now render full interactive HTML UIs directly inside chat, ending the era of tool outputs being just plain text.

  • MCP-UI β†’ MCP Apps: jointly standardized by Anthropic and OpenAI
  • Agents can now render full interactive HTML UIs in-chat
  • Avoids 'iOS vs Android' fragmentation: one open standard
  • mcpui.dev already has demos running with Qwen and Claude
Ido Sal
Ido Sal
"MCP Apps, which is the standard that was just released in the weekend, it's actually unification of MCP-UI and what OpenAI was calling Operator Plugins."
Alex Volkov
Alex Volkov
"LM chatbots stop being just a chat window β€” and start becoming an operating system for the web."

🏒 Big CO LLMs β€” Claude Opus 4.5

Anthropic's Opus 4.5 is finally here and it's reclaiming the coding throne: 80.9% SWE-bench Verified, a new 'Effort' parameter for compute control, Tool Search to cut agent overhead, and Programmatic Tool Calling for code-loop data management β€” all at one-third the old Opus price. Yam and Wolfram both stress-tested it; Yam was blown away by the depth of detail it holds for complex stacks.

  • Opus 4.5: 80.9% SWE-bench Verified, tops GPT-5.1 (77.9%) and Gemini 3 Pro (76.2%)
  • New 'Effort' parameter: control thinking depth like o1 reasoning tokens
  • Tool Search: massively cuts token overhead for agents with many tools
  • Programmatic Tool Calling: Opus writes and executes code loops
  • $5/M input, $25/M output β€” 3x cheaper than old Opus
Yam Peleg
Yam Peleg
"Opus knows a lot of tiny details about the stack that you didn't even know you wanted. It feels like it can go forever."
Wolfram Ravenwolf
Wolfram Ravenwolf
"I chatted with it for a couple of hours actually β€” it was a monster. Absolutely impressive for reasoning tasks."

πŸŽ₯ Vision & Video β€” HunyuanOCR + LTX Retake

Tencent's HunyuanOCR (1B) scores 860 on OCRBench, beating 72B models β€” a stunning example of task-specialized small models. HunyuanVideo 1.5 brings lightweight open video generation. LTX Studio's Retake enables Photoshop-style editing of specific objects within video frames, and a mysterious 'Whisper Thunder' tops the video arena leaderboard.

  • HunyuanOCR 1B: 860 OCRBench, beats Qwen3-VL-72B
  • HunyuanVideo 1.5: lightweight open-source video generation
  • LTX Retake: video inpainting/object editing β€” Photoshop for video
  • Whisper Thunder: mystery model at #1 on video arena
Wolfram Ravenwolf
Wolfram Ravenwolf
"What we are seeing is the image editing moment for video. You can take this β€” Photoshop for video β€” and change it the way you want."
TL;DR and Show Notes
  • Hosts and Guests

  • Big CO LLMs + APIs

    • Anthropic launches Claude Opus 4.5 - world’s top model for coding, agents, and tool use (X, Announcement, Blog)

    • OpenAI Integrates ChatGPT Voice Mode Directly into Chats (X)

  • Open Source LLMs

    • Prime Intellect - INTELLECT-3 106B MoE (X, HF, Blog, Try It)

    • Tencent - HunyuanOCR 1B SOTA OCR model (X, HF, Github, Blog)

    • Microsoft - Fara-7B on-device computer-use agent (X, Blog, HF, Github)

    • DeepSeek - Math-V2 IMO-gold math LLM (HF)

  • Interview: MCP Apps

  • Vision & Video

    • Tencent - HunyuanVideo 1.5 lightweight DiT open video model (X, GitHub, HF)

    • LTX Studio - Retake AI video editing tool (X, Try It)

    • Whisper Thunder - mystery #1 ranked video model on arena

  • AI Art & Diffusion

    • Black Forest Labs - FLUX.2 32B multi-reference image model (X, HF, Blog)

    • Tongyi - Z-Image Turbo sub-second 6B image gen (GitHub, HF)

  • This Week’s Buzz

    • W&B launches Serverless LoRA Inference on CoreWeave (X, Blog, Notebook)