ThursdAI · July 3, 2025

📆 ThursdAI - Jul 3 - ERNIE 4.5, Hunyuan A13B, MAI-DxO outperforms doctors, RL beats SWE bench, Zuck MSL hiring spree & more AI news

From Weights & Biases, with 50% of 2025 behind us, 2 interviews + Zuck's expensive hiring spree and Chinese open source models galore. Another banger week in AI!

By Alex Volkov

96 min

YouTube Spotify Apple Podcasts Substack

Episode Summary

ThursdAI opens the second half of 2025 with a packed show: Chinese open-source releases keep landing, Meta is spending aggressively to assemble its superintelligence lab, and Microsoft suddenly has a headline medical-reasoning result. Alex, Yam, Wolfram, and LDJ bring on Michael Luo to unpack how DeepSWE reached 59% on SWE-Bench Verified with RL on top of Qwen, then Ivan Burazin to explain why agent-native sandboxes like Daytona are becoming core infrastructure. The episode moves quickly from ERNIE 4.5 and Hunyuan to Cursor, Cloudflare, Mirage, and new TTS models, making it feel like a clean snapshot of AI's accelerating arms race.

In This Episode

👋 Introduction and Welcome
🧨 Breaking News: ThursdAI Acquired by Meta Superintelligence
🎙️ Guest Introductions and Topics Overview
📰 TLDR: Quick Rundown of Today's Topics
🔓 Deep Dive: Baidu's Ernie 4.5 Release
🧠 Tencent's WizardLM and Hunyuan Model
⚙️ Huawei's Pangu Pro: A Game Changer
🤖 Introducing DeepSuite: An Open Source RL Coding Agent
🧪 DeepSuite's Development and Achievements
🎯 The Role of Reinforcement Learning in AI
🛠️ Technical Details and Training Challenges
👐 The Importance of Open Source in AI
📣 This Week's Buzz: Weights & Biases Updates
🏢 Meta's Bold Moves in AI
🍎 Apple's Collaboration with Anthropic
💼 Courser's Aggressive Hiring Spree
🧑‍💻 Cursor's Latest Developments
🧠 New AI Models and OpenRouter
🌐 Cloudflare's AI Bot Blocking Initiative
🩺 Microsoft's Medical AI Breakthrough
🏗️ Interview with Ivan from Daytona
🎮 Mirage: The Future of AI-Generated Games
🔊 Closing Remarks and TTS Releases

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Michael Luo

PhD Student · UC Berkeley / Agentica

@michaelzluo

Ivan Burazin

CEO & Co-Founder · Daytona

AI builder & founder

Weekly co-host · AI evaluator

@WolframRvnwlf

LDJ

Weekly co-host · Nous Research

@ldjconfirmed

By The Numbers

ERNIE 4.5 models

Baidu open-sourced a full family ranging from 424B down to 0.3B parameters.

Hunyuan active params

13B

Tencent's 80B MoE activates only 13B parameters at inference while keeping a 256K context window.

SWE-Bench Verified

59%

DeepSWE-Preview is framed as a top open RL coding-agent result built on Qwen.

MAI-DxO accuracy

85.5%

Microsoft says its medical orchestration system beats doctors on challenging NEJM-style diagnostic cases.

Rumored Meta packages

$300M

The panel treats Zuck's recruiting offers as proof that the talent war has entered a new phase.

👋 Introduction and Welcome

Alex kicks off the July 3 show as the first episode of the second half of 2025, resets the ThursdAI premise, and brings Yam and Wolfram onto the panel before the news sprint begins.

First ThursdAI episode of H2 2025
Alex frames the show as a catch-up on everything important in AI that week

Alex Volkov

"What's going on everyone? Welcome, welcome to ThursdAI for July 3rd."

🧨 Breaking News: ThursdAI Acquired by Meta Superintelligence

Alex opens with a tongue-in-cheek fake acquisition announcement about Meta Superintelligence to set up the real conversation about Meta's hiring spree later in the episode.

A playful cold open before the real Meta segment
Transitions the show into the week's talent-war theme

Alex Volkov

"ThursdAI was acquired by Meta Superintelligence team. All of us get 100 million signing bonuses."

🎙️ Guest Introductions and Topics Overview

The panel previews two guest conversations: Michael Luo joins to unpack the top open RL coding-agent result of the week, and Ivan Burazin joins later to talk about Daytona and agent infrastructure.

Michael Luo is introduced for the DeepSWE RL segment
Ivan Burazin is previewed for the Daytona infrastructure interview

📰 TLDR: Quick Rundown of Today's Topics

Alex speed-runs the agenda: guest interviews, Chinese open-source releases, Meta's recruiting binge, Cursor product and hiring news, Cloudflare bot blocking, Microsoft medical AI, Mirage, and fresh TTS models.

The episode roadmap blends open models, big-lab politics, tooling, and infrastructure
The guest interviews are positioned as anchors inside a broader news-heavy show

Alex Volkov

"This is the TLDR. This is the corner on ThursdAI where we talk, we'll give you a brief TLDR of everything that we're going to cover."

🔓 Deep Dive: Baidu's Ernie 4.5 Release

Alex frames ERNIE 4.5 as another sign that Chinese labs are setting the pace in open source, highlighting Baidu's broad model family, multimodal performance, and sharp reversal from its previous anti-open posture.

ERNIE 4.5 ships as a 10-model family with multimodal capabilities
The panel treats Baidu's open release as a meaningful strategic shift

Alex Volkov

"It's clear that the Chinese folks are winning the open source."

🧠 Tencent's WizardLM and Hunyuan Model

The conversation turns to Tencent's Hunyuan-A13B-Instruct and the WizardLM team behind it, with emphasis on its small active-parameter footprint, strong reasoning benchmarks, and the practical reality of its license limits.

WizardLM lineage gives Hunyuan instant credibility with the panel
13B active parameters makes the model feel unusually practical for its class

⚙️ Huawei's Pangu Pro: A Game Changer

Huawei's Pangu Pro becomes the geopolitical open-model story of the week: a large MoE trained on Ascend NPUs rather than Nvidia or AMD hardware, signaling how far Chinese compute stacks have advanced under sanctions.

72B MoE trained on Ascend NPUs instead of Western GPUs
1,528 tokens/sec and 13T pretraining tokens stand out as the headline specs

Alex Volkov

"Huawei launches Pangu Pro, which is 72 billion parameter MOE."

🤖 Introducing DeepSuite: An Open Source RL Coding Agent

Michael Luo joins the show as Alex introduces DeepSWE-Preview, the open RL coding-agent project from Agentica and collaborators that just hit a strong public SWE-Bench Verified score.

Michael Luo joins live to explain the new open RL coding-agent result
The segment centers on why a 59% open result matters in a benchmark dominated by closed systems and scaffolds

🧪 DeepSuite's Development and Achievements

Alex asks Michael why the team used Qwen, how the system compares with earlier open coding efforts, and what it took to push an open-weight stack near the top tier of SWE-Bench Verified.

Qwen is treated as the foundation model that made the run possible
The result is benchmarked against prior open and scaffold-heavy approaches

🎯 The Role of Reinforcement Learning in AI

The show pauses for a practical RL explainer, connecting DeepSeek-era enthusiasm to why reinforcement learning is suddenly central again for reasoning-heavy coding agents.

Alex reframes RL for the broader ThursdAI audience
The panel positions RL as a major second-wave optimization for coding agents after instruct tuning

🛠️ Technical Details and Training Challenges

Michael gets into objectives, verification, and the messier details of evaluating an RL coding agent on SWE-Bench Verified, including how hard it is to define a reward that genuinely tracks useful engineering work.

Reward design and verification are presented as the hard part, not just compute
The discussion focuses on benchmark-grounded evaluation rather than vague "agent vibes"

👐 The Importance of Open Source in AI

The interview closes on why publishing training details and open models still matters: it gives the community something reproducible, inspectable, and improvable instead of another sealed benchmark number.

Alex emphasizes the value of releasing methodology, not just scores
Michael frames open source as an accelerant for the next wave of agent research

📣 This Week's Buzz: Weights & Biases Updates

Alex uses the traditional W&B interlude to plug Weavehacks, remind listeners what he does at Weights & Biases, and reset the show before the big-company segment.

Weavehacks gets called out as the community event of the week
The buzz segment acts as the bridge from open source into big-lab drama

Alex Volkov

"In this week's buzz, a corner on ThursdAI where I talk about everything related to Weights Biases that happened this week."

🏢 Meta's Bold Moves in AI

The panel spends serious time on Meta Superintelligence Labs, treating the recruiting spree and rumored compensation packages as evidence that the AI talent market has broken into full wartime economics.

Meta's dream-team assembly becomes the headline big-company story
The discussion focuses on whether money alone can buy research momentum

🍎 Apple's Collaboration with Anthropic

Alex briefly hits the rumor that Apple may be leaning on Anthropic rather than only internal model efforts, reading it as another sign that even huge incumbents are rethinking how they ship consumer AI.

Apple is framed as increasingly open to outside model help
The segment is short but reinforces the pressure on slow-moving incumbents

Alex Volkov

"Apple is supposedly working with Anthropic on some stuff and have given up on internal building our own models."

💼 Courser's Aggressive Hiring Spree

The talent-war theme continues with Cursor poaching key people behind Claude Code, reinforcing how quickly top AI product companies are consolidating engineering talent.

Cursor is described as hiring aggressively from the strongest adjacent teams
The move is treated as more than HR news: it is product-strategy news

🧑‍💻 Cursor's Latest Developments

Beyond hiring, Cursor's web, mobile, and Slack agent rollout is presented as a glimpse of where coding tools are going: always-on, distributed, and not tied to a single editor window.

Cursor agents expand onto web, mobile, and Slack
The panel reads this as a sign that code agents are becoming ambient workflow software

Alex Volkov

"Cursor also launches this week, also had a bunch of news launches. rolled out its AI coding agents on web and mobile."

🧠 New AI Models and OpenRouter

Alex briefly tours the free 1M-context Cypher Alpha listing on OpenRouter and uses it to highlight how model releases now increasingly arrive as product experiments, leaks, or market probes rather than tidy launches.

Cypher Alpha is notable mostly because of its free 1M context window
The segment captures the increasingly chaotic model-release environment

Alex Volkov

"A new secret 1 million context model is tested on open router. It's called Cypher Alpha."

🌐 Cloudflare's AI Bot Blocking Initiative

Cloudflare's one-click AI bot blocking feature becomes the short policy-and-platform detour of the episode, with Alex framing it as a direct response to the economics of perpetual web scraping.

The story is about internet incentives as much as security controls
The panel flags the tension between open research norms and commercial scraping

Alex Volkov

"Cloudflare, the company that has put a firewall in front of most of the internet, has announced a new initiative of theirs to say they are providing a one click AI bot blocking."

🩺 Microsoft's Medical AI Breakthrough

Microsoft's MAI-DxO result is framed as one of the more meaningful application-level announcements of the week, because it suggests orchestration systems may outperform single models in high-stakes expert workflows.

MAI-DxO is discussed as a systems win, not just a model win
The story stands out because it connects model progress to real-world healthcare economics

🏗️ Interview with Ivan from Daytona

Ivan Burazin joins to explain Daytona's agent-native sandbox runtime, why 2025 is turning into the year of agent infrastructure, and how fast the company is growing as more builders realize agents need computers, not just APIs.

Daytona is framed as critical "stateful serverless" infrastructure for agents
The interview highlights fast growth, regional deployment, and startup credits as signs of demand

Ivan Burazin

"Yvonne is great."

🎮 Mirage: The Future of AI-Generated Games

Mirage delivers the most visibly fun demo of the episode: Alex reacts in real time to an AI-native UGC game engine powered by world-model-style generation and treats it as a preview of where interactive media is headed.

Mirage is pitched as an AI-native game engine rather than a one-off demo
The discussion connects world models to end-user creative tooling

Alex Volkov

"Mirage is a, world's first, that's what they say on the website, world's first AI native UGC game engine."

🔊 Closing Remarks and TTS Releases

The episode lands on two voice-model releases—Kyutai TTS and Qwen-TTS—as Alex closes out on time and leaves listeners with one more reminder that multimodal product quality is rising everywhere at once.

Kyutai and Qwen TTS are the final quick-hit launches of the show
The close reinforces how broad the week's progress was across models, tools, infra, and media

Alex Volkov

"At the end of the show, folks, the last thing that we don't have time to get to, unfortunately, we have two TTS releases."

TL;DR and Show Notes

Here’s the quick rundown of everything we covered this week, packed with links to dive deeper:

Show Notes & Guests
- Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
- Co-Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed
- Guests - Ivan Burazin (Daytona), Michael Luo (Agentica)
Open Source LLMs
- Baidu’s ERNIE 4.5 Series - 10 models, 424B to 0.3B, multimodal, beats o1 on DocVQA (X, HF, Paper)
- Tencent’s Hunyuan-A13B-Instruct - 80B total, 13B active, 256K context, WizardLM legacy (X, HF, Try It)
- Huawei’s Pangu Pro MoE - 72B, trained on Ascend NPUs, 1,528 tokens/sec (X, HF)
- DeepSWE-Preview - RL agent, 59% SWE-Bench-Verified on Qwen3-32B (Notion, HF Model)
This Week’s Buzz
- Weights & Biases Weavehacks Hackathon - SF, July 12-13, agent protocols focus (Sign Up)
Big CO LLMs + APIs
- Meta Superintelligence Labs (MSL) - Zuck hires dream team, up to $300M comp packages from OpenAI talent (list)
- Cursor - Hires Claude Code creators, web/mobile agents with Slack (Cursor, HF)
- Microsoft MAI-DxO - 85.5% accuracy on NEJM cases vs. 20% for doctors (X, Blog)
- Cloudflare - One-click AI bot blocking, tackles scraping economics (X)
- Cypher Alpha - Mystery 1M context model, possibly Amazon Titan (Link)
- Gemini Pro 2.5 - Returned to Google’s free tier
Vision & Video
- Mirage - AI-native UGC game engine, real-time photorealistic demos (Playable Demo)
- Workflow - Restyle videos with Flux Kontext and Luma Modify (X)
Voice & Audio
- Kyutai TTS - Low-latency, high similarity in EN/FR (X, HF)
- Qwen-TTS - Bilingual Chinese/English, human-level naturalness (X, HF)
Infrastructure
- Daytona - Agent-native sandboxes, $1M run rate in 2 months (GitHub, Startups)
Tools
- Chai Discovery’s Chai-2 - Zero-shot antibody design (Chai Discovery)

Thanks for reading all the way through ThursdAI, folks! Share this with friends to spread the AI love, and I’ll catch you next week for more!

Alex Volkov 0:30

What's going on everyone?

0:32

Welcome, welcome to ThursdAI for July 3rd. this is our first episode of the second half of 2025. Welcome. My name is Alex Volkov. I'm an AI evangelist with Weights Biases. I'm so happy to be back with you all here on ThursdAI to talk to you about everything that happened in AI this week, everything that we learned. Dean Important, everything that folks talked about, and very happy to be also joined by my friends and co hosts. Today we have Yam Peleg and Wolf from RavenWolf. Welcome folks, welcome. Hello, how are things? How are you guys doing?

Yam Peleg 1:08

What a week!

1:09

What a week! Alex is so back and Ai is so back.

Alex Volkov 1:13

What topic of conversation is going to be the

1:15

most interesting for you today?

Yam Peleg 1:17

Look, AI became sports betting, it's NBA season.

Alex Volkov 1:22

Absolutely.

1:23

Speaking of which, I want to announce, folks, I have breaking news. ThursdAI was acquired by Meta Superintelligence team. All of us get 100 million signing bonuses. Let's go! Let's go! We did it!

Yam Peleg 1:35

I saw it coming.

Alex Volkov 1:37

Yeah, Soham is the host.