Episode Summary

ThursdAI opens the second half of 2025 with a packed show: Chinese open-source releases keep landing, Meta is spending aggressively to assemble its superintelligence lab, and Microsoft suddenly has a headline medical-reasoning result. Alex, Yam, Wolfram, and LDJ bring on Michael Luo to unpack how DeepSWE reached 59% on SWE-Bench Verified with RL on top of Qwen, then Ivan Burazin to explain why agent-native sandboxes like Daytona are becoming core infrastructure. The episode moves quickly from ERNIE 4.5 and Hunyuan to Cursor, Cloudflare, Mirage, and new TTS models, making it feel like a clean snapshot of AI's accelerating arms race.

Hosts & Guests

Alex Volkov
Alex Volkov
Host ยท W&B / CoreWeave
@altryne
Michael Luo
Michael Luo
PhD Student ยท UC Berkeley / Agentica
@michaelzluo
Ivan Burazin
Ivan Burazin
CEO & Co-Founder ยท Daytona
@ivanburazin
Yam Peleg
Yam Peleg
AI builder & founder
@Yampeleg
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host ยท AI evaluator
@WolframRvnwlf
LDJ
LDJ
Weekly co-host ยท Nous Research
@ldjconfirmed

By The Numbers

ERNIE 4.5 models
10
Baidu open-sourced a full family ranging from 424B down to 0.3B parameters.
Hunyuan active params
13B
Tencent's 80B MoE activates only 13B parameters at inference while keeping a 256K context window.
SWE-Bench Verified
59%
DeepSWE-Preview is framed as a top open RL coding-agent result built on Qwen.
MAI-DxO accuracy
85.5%
Microsoft says its medical orchestration system beats doctors on challenging NEJM-style diagnostic cases.
Rumored Meta packages
$300M
The panel treats Zuck's recruiting offers as proof that the talent war has entered a new phase.

๐Ÿ‘‹ Introduction and Welcome

Alex kicks off the July 3 show as the first episode of the second half of 2025, resets the ThursdAI premise, and brings Yam and Wolfram onto the panel before the news sprint begins.

  • First ThursdAI episode of H2 2025
  • Alex frames the show as a catch-up on everything important in AI that week
Alex Volkov
Alex Volkov
"What's going on everyone? Welcome, welcome to ThursdAI for July 3rd."

๐Ÿงจ Breaking News: ThursdAI Acquired by Meta Superintelligence

Alex opens with a tongue-in-cheek fake acquisition announcement about Meta Superintelligence to set up the real conversation about Meta's hiring spree later in the episode.

  • A playful cold open before the real Meta segment
  • Transitions the show into the week's talent-war theme
Alex Volkov
Alex Volkov
"ThursdAI was acquired by Meta Superintelligence team. All of us get 100 million signing bonuses."

๐ŸŽ™๏ธ Guest Introductions and Topics Overview

The panel previews two guest conversations: Michael Luo joins to unpack the top open RL coding-agent result of the week, and Ivan Burazin joins later to talk about Daytona and agent infrastructure.

  • Michael Luo is introduced for the DeepSWE RL segment
  • Ivan Burazin is previewed for the Daytona infrastructure interview

๐Ÿ“ฐ TLDR: Quick Rundown of Today's Topics

Alex speed-runs the agenda: guest interviews, Chinese open-source releases, Meta's recruiting binge, Cursor product and hiring news, Cloudflare bot blocking, Microsoft medical AI, Mirage, and fresh TTS models.

  • The episode roadmap blends open models, big-lab politics, tooling, and infrastructure
  • The guest interviews are positioned as anchors inside a broader news-heavy show
Alex Volkov
Alex Volkov
"This is the TLDR. This is the corner on ThursdAI where we talk, we'll give you a brief TLDR of everything that we're going to cover."

๐Ÿ”“ Deep Dive: Baidu's Ernie 4.5 Release

Alex frames ERNIE 4.5 as another sign that Chinese labs are setting the pace in open source, highlighting Baidu's broad model family, multimodal performance, and sharp reversal from its previous anti-open posture.

  • ERNIE 4.5 ships as a 10-model family with multimodal capabilities
  • The panel treats Baidu's open release as a meaningful strategic shift
Alex Volkov
Alex Volkov
"It's clear that the Chinese folks are winning the open source."

๐Ÿง  Tencent's WizardLM and Hunyuan Model

The conversation turns to Tencent's Hunyuan-A13B-Instruct and the WizardLM team behind it, with emphasis on its small active-parameter footprint, strong reasoning benchmarks, and the practical reality of its license limits.

  • WizardLM lineage gives Hunyuan instant credibility with the panel
  • 13B active parameters makes the model feel unusually practical for its class

โš™๏ธ Huawei's Pangu Pro: A Game Changer

Huawei's Pangu Pro becomes the geopolitical open-model story of the week: a large MoE trained on Ascend NPUs rather than Nvidia or AMD hardware, signaling how far Chinese compute stacks have advanced under sanctions.

  • 72B MoE trained on Ascend NPUs instead of Western GPUs
  • 1,528 tokens/sec and 13T pretraining tokens stand out as the headline specs
Alex Volkov
Alex Volkov
"Huawei launches Pangu Pro, which is 72 billion parameter MOE."

๐Ÿค– Introducing DeepSuite: An Open Source RL Coding Agent

Michael Luo joins the show as Alex introduces DeepSWE-Preview, the open RL coding-agent project from Agentica and collaborators that just hit a strong public SWE-Bench Verified score.

  • Michael Luo joins live to explain the new open RL coding-agent result
  • The segment centers on why a 59% open result matters in a benchmark dominated by closed systems and scaffolds

๐Ÿงช DeepSuite's Development and Achievements

Alex asks Michael why the team used Qwen, how the system compares with earlier open coding efforts, and what it took to push an open-weight stack near the top tier of SWE-Bench Verified.

  • Qwen is treated as the foundation model that made the run possible
  • The result is benchmarked against prior open and scaffold-heavy approaches

๐ŸŽฏ The Role of Reinforcement Learning in AI

The show pauses for a practical RL explainer, connecting DeepSeek-era enthusiasm to why reinforcement learning is suddenly central again for reasoning-heavy coding agents.

  • Alex reframes RL for the broader ThursdAI audience
  • The panel positions RL as a major second-wave optimization for coding agents after instruct tuning

๐Ÿ› ๏ธ Technical Details and Training Challenges

Michael gets into objectives, verification, and the messier details of evaluating an RL coding agent on SWE-Bench Verified, including how hard it is to define a reward that genuinely tracks useful engineering work.

  • Reward design and verification are presented as the hard part, not just compute
  • The discussion focuses on benchmark-grounded evaluation rather than vague "agent vibes"

๐Ÿ‘ The Importance of Open Source in AI

The interview closes on why publishing training details and open models still matters: it gives the community something reproducible, inspectable, and improvable instead of another sealed benchmark number.

  • Alex emphasizes the value of releasing methodology, not just scores
  • Michael frames open source as an accelerant for the next wave of agent research

๐Ÿ“ฃ This Week's Buzz: Weights & Biases Updates

Alex uses the traditional W&B interlude to plug Weavehacks, remind listeners what he does at Weights & Biases, and reset the show before the big-company segment.

  • Weavehacks gets called out as the community event of the week
  • The buzz segment acts as the bridge from open source into big-lab drama
Alex Volkov
Alex Volkov
"In this week's buzz, a corner on ThursdAI where I talk about everything related to Weights Biases that happened this week."

๐Ÿข Meta's Bold Moves in AI

The panel spends serious time on Meta Superintelligence Labs, treating the recruiting spree and rumored compensation packages as evidence that the AI talent market has broken into full wartime economics.

  • Meta's dream-team assembly becomes the headline big-company story
  • The discussion focuses on whether money alone can buy research momentum

๐ŸŽ Apple's Collaboration with Anthropic

Alex briefly hits the rumor that Apple may be leaning on Anthropic rather than only internal model efforts, reading it as another sign that even huge incumbents are rethinking how they ship consumer AI.

  • Apple is framed as increasingly open to outside model help
  • The segment is short but reinforces the pressure on slow-moving incumbents
Alex Volkov
Alex Volkov
"Apple is supposedly working with Anthropic on some stuff and have given up on internal building our own models."

๐Ÿ’ผ Courser's Aggressive Hiring Spree

The talent-war theme continues with Cursor poaching key people behind Claude Code, reinforcing how quickly top AI product companies are consolidating engineering talent.

  • Cursor is described as hiring aggressively from the strongest adjacent teams
  • The move is treated as more than HR news: it is product-strategy news

๐Ÿง‘โ€๐Ÿ’ป Cursor's Latest Developments

Beyond hiring, Cursor's web, mobile, and Slack agent rollout is presented as a glimpse of where coding tools are going: always-on, distributed, and not tied to a single editor window.

  • Cursor agents expand onto web, mobile, and Slack
  • The panel reads this as a sign that code agents are becoming ambient workflow software
Alex Volkov
Alex Volkov
"Cursor also launches this week, also had a bunch of news launches. rolled out its AI coding agents on web and mobile."

๐Ÿง  New AI Models and OpenRouter

Alex briefly tours the free 1M-context Cypher Alpha listing on OpenRouter and uses it to highlight how model releases now increasingly arrive as product experiments, leaks, or market probes rather than tidy launches.

  • Cypher Alpha is notable mostly because of its free 1M context window
  • The segment captures the increasingly chaotic model-release environment
Alex Volkov
Alex Volkov
"A new secret 1 million context model is tested on open router. It's called Cypher Alpha."

๐ŸŒ Cloudflare's AI Bot Blocking Initiative

Cloudflare's one-click AI bot blocking feature becomes the short policy-and-platform detour of the episode, with Alex framing it as a direct response to the economics of perpetual web scraping.

  • The story is about internet incentives as much as security controls
  • The panel flags the tension between open research norms and commercial scraping
Alex Volkov
Alex Volkov
"Cloudflare, the company that has put a firewall in front of most of the internet, has announced a new initiative of theirs to say they are providing a one click AI bot blocking."

๐Ÿฉบ Microsoft's Medical AI Breakthrough

Microsoft's MAI-DxO result is framed as one of the more meaningful application-level announcements of the week, because it suggests orchestration systems may outperform single models in high-stakes expert workflows.

  • MAI-DxO is discussed as a systems win, not just a model win
  • The story stands out because it connects model progress to real-world healthcare economics

๐Ÿ—๏ธ Interview with Ivan from Daytona

Ivan Burazin joins to explain Daytona's agent-native sandbox runtime, why 2025 is turning into the year of agent infrastructure, and how fast the company is growing as more builders realize agents need computers, not just APIs.

  • Daytona is framed as critical "stateful serverless" infrastructure for agents
  • The interview highlights fast growth, regional deployment, and startup credits as signs of demand
Ivan Burazin
Ivan Burazin
"Yvonne is great."

๐ŸŽฎ Mirage: The Future of AI-Generated Games

Mirage delivers the most visibly fun demo of the episode: Alex reacts in real time to an AI-native UGC game engine powered by world-model-style generation and treats it as a preview of where interactive media is headed.

  • Mirage is pitched as an AI-native game engine rather than a one-off demo
  • The discussion connects world models to end-user creative tooling
Alex Volkov
Alex Volkov
"Mirage is a, world's first, that's what they say on the website, world's first AI native UGC game engine."

๐Ÿ”Š Closing Remarks and TTS Releases

The episode lands on two voice-model releasesโ€”Kyutai TTS and Qwen-TTSโ€”as Alex closes out on time and leaves listeners with one more reminder that multimodal product quality is rising everywhere at once.

  • Kyutai and Qwen TTS are the final quick-hit launches of the show
  • The close reinforces how broad the week's progress was across models, tools, infra, and media
Alex Volkov
Alex Volkov
"At the end of the show, folks, the last thing that we don't have time to get to, unfortunately, we have two TTS releases."
TL;DR and Show Notes

Hereโ€™s the quick rundown of everything we covered this week, packed with links to dive deeper:

  • Show Notes & Guests

  • Open Source LLMs

    • Baiduโ€™s ERNIE 4.5 Series - 10 models, 424B to 0.3B, multimodal, beats o1 on DocVQA (X, HF, Paper)

    • Tencentโ€™s Hunyuan-A13B-Instruct - 80B total, 13B active, 256K context, WizardLM legacy (X, HF, Try It)

    • Huaweiโ€™s Pangu Pro MoE - 72B, trained on Ascend NPUs, 1,528 tokens/sec (X, HF)

    • DeepSWE-Preview - RL agent, 59% SWE-Bench-Verified on Qwen3-32B (Notion, HF Model)

  • This Weekโ€™s Buzz

    • Weights & Biases Weavehacks Hackathon - SF, July 12-13, agent protocols focus (Sign Up)

  • Big CO LLMs + APIs

    • Meta Superintelligence Labs (MSL) - Zuck hires dream team, up to $300M comp packages from OpenAI talent (list)

    • Cursor - Hires Claude Code creators, web/mobile agents with Slack (Cursor, HF)

    • Microsoft MAI-DxO - 85.5% accuracy on NEJM cases vs. 20% for doctors (X, Blog)

    • Cloudflare - One-click AI bot blocking, tackles scraping economics (X)

    • Cypher Alpha - Mystery 1M context model, possibly Amazon Titan (Link)

    • Gemini Pro 2.5 - Returned to Googleโ€™s free tier

  • Vision & Video

    • Mirage - AI-native UGC game engine, real-time photorealistic demos (Playable Demo)

    • Workflow - Restyle videos with Flux Kontext and Luma Modify (X)

  • Voice & Audio

    • Kyutai TTS - Low-latency, high similarity in EN/FR (X, HF)

    • Qwen-TTS - Bilingual Chinese/English, human-level naturalness (X, HF)

  • Infrastructure

    • Daytona - Agent-native sandboxes, $1M run rate in 2 months (GitHub, Startups)

  • Tools

    • Chai Discoveryโ€™s Chai-2 - Zero-shot antibody design (Chai Discovery)

Thanks for reading all the way through ThursdAI, folks! Share this with friends to spread the AI love, and Iโ€™ll catch you next week for more!