Episode Summary
ThursdAI opens the second half of 2025 with a packed show: Chinese open-source releases keep landing, Meta is spending aggressively to assemble its superintelligence lab, and Microsoft suddenly has a headline medical-reasoning result. Alex, Yam, Wolfram, and LDJ bring on Michael Luo to unpack how DeepSWE reached 59% on SWE-Bench Verified with RL on top of Qwen, then Ivan Burazin to explain why agent-native sandboxes like Daytona are becoming core infrastructure. The episode moves quickly from ERNIE 4.5 and Hunyuan to Cursor, Cloudflare, Mirage, and new TTS models, making it feel like a clean snapshot of AI's accelerating arms race.
In This Episode
- ๐ Introduction and Welcome
- ๐งจ Breaking News: ThursdAI Acquired by Meta Superintelligence
- ๐๏ธ Guest Introductions and Topics Overview
- ๐ฐ TLDR: Quick Rundown of Today's Topics
- ๐ Deep Dive: Baidu's Ernie 4.5 Release
- ๐ง Tencent's WizardLM and Hunyuan Model
- โ๏ธ Huawei's Pangu Pro: A Game Changer
- ๐ค Introducing DeepSuite: An Open Source RL Coding Agent
- ๐งช DeepSuite's Development and Achievements
- ๐ฏ The Role of Reinforcement Learning in AI
- ๐ ๏ธ Technical Details and Training Challenges
- ๐ The Importance of Open Source in AI
- ๐ฃ This Week's Buzz: Weights & Biases Updates
- ๐ข Meta's Bold Moves in AI
- ๐ Apple's Collaboration with Anthropic
- ๐ผ Courser's Aggressive Hiring Spree
- ๐งโ๐ป Cursor's Latest Developments
- ๐ง New AI Models and OpenRouter
- ๐ Cloudflare's AI Bot Blocking Initiative
- ๐ฉบ Microsoft's Medical AI Breakthrough
- ๐๏ธ Interview with Ivan from Daytona
- ๐ฎ Mirage: The Future of AI-Generated Games
- ๐ Closing Remarks and TTS Releases
Hosts & Guests
By The Numbers
๐ Introduction and Welcome
Alex kicks off the July 3 show as the first episode of the second half of 2025, resets the ThursdAI premise, and brings Yam and Wolfram onto the panel before the news sprint begins.
- First ThursdAI episode of H2 2025
- Alex frames the show as a catch-up on everything important in AI that week
๐งจ Breaking News: ThursdAI Acquired by Meta Superintelligence
Alex opens with a tongue-in-cheek fake acquisition announcement about Meta Superintelligence to set up the real conversation about Meta's hiring spree later in the episode.
- A playful cold open before the real Meta segment
- Transitions the show into the week's talent-war theme
๐๏ธ Guest Introductions and Topics Overview
The panel previews two guest conversations: Michael Luo joins to unpack the top open RL coding-agent result of the week, and Ivan Burazin joins later to talk about Daytona and agent infrastructure.
- Michael Luo is introduced for the DeepSWE RL segment
- Ivan Burazin is previewed for the Daytona infrastructure interview
๐ฐ TLDR: Quick Rundown of Today's Topics
Alex speed-runs the agenda: guest interviews, Chinese open-source releases, Meta's recruiting binge, Cursor product and hiring news, Cloudflare bot blocking, Microsoft medical AI, Mirage, and fresh TTS models.
- The episode roadmap blends open models, big-lab politics, tooling, and infrastructure
- The guest interviews are positioned as anchors inside a broader news-heavy show
๐ Deep Dive: Baidu's Ernie 4.5 Release
Alex frames ERNIE 4.5 as another sign that Chinese labs are setting the pace in open source, highlighting Baidu's broad model family, multimodal performance, and sharp reversal from its previous anti-open posture.
- ERNIE 4.5 ships as a 10-model family with multimodal capabilities
- The panel treats Baidu's open release as a meaningful strategic shift
๐ง Tencent's WizardLM and Hunyuan Model
The conversation turns to Tencent's Hunyuan-A13B-Instruct and the WizardLM team behind it, with emphasis on its small active-parameter footprint, strong reasoning benchmarks, and the practical reality of its license limits.
- WizardLM lineage gives Hunyuan instant credibility with the panel
- 13B active parameters makes the model feel unusually practical for its class
โ๏ธ Huawei's Pangu Pro: A Game Changer
Huawei's Pangu Pro becomes the geopolitical open-model story of the week: a large MoE trained on Ascend NPUs rather than Nvidia or AMD hardware, signaling how far Chinese compute stacks have advanced under sanctions.
- 72B MoE trained on Ascend NPUs instead of Western GPUs
- 1,528 tokens/sec and 13T pretraining tokens stand out as the headline specs
๐ค Introducing DeepSuite: An Open Source RL Coding Agent
Michael Luo joins the show as Alex introduces DeepSWE-Preview, the open RL coding-agent project from Agentica and collaborators that just hit a strong public SWE-Bench Verified score.
- Michael Luo joins live to explain the new open RL coding-agent result
- The segment centers on why a 59% open result matters in a benchmark dominated by closed systems and scaffolds
๐งช DeepSuite's Development and Achievements
Alex asks Michael why the team used Qwen, how the system compares with earlier open coding efforts, and what it took to push an open-weight stack near the top tier of SWE-Bench Verified.
- Qwen is treated as the foundation model that made the run possible
- The result is benchmarked against prior open and scaffold-heavy approaches
๐ฏ The Role of Reinforcement Learning in AI
The show pauses for a practical RL explainer, connecting DeepSeek-era enthusiasm to why reinforcement learning is suddenly central again for reasoning-heavy coding agents.
- Alex reframes RL for the broader ThursdAI audience
- The panel positions RL as a major second-wave optimization for coding agents after instruct tuning
๐ ๏ธ Technical Details and Training Challenges
Michael gets into objectives, verification, and the messier details of evaluating an RL coding agent on SWE-Bench Verified, including how hard it is to define a reward that genuinely tracks useful engineering work.
- Reward design and verification are presented as the hard part, not just compute
- The discussion focuses on benchmark-grounded evaluation rather than vague "agent vibes"
๐ The Importance of Open Source in AI
The interview closes on why publishing training details and open models still matters: it gives the community something reproducible, inspectable, and improvable instead of another sealed benchmark number.
- Alex emphasizes the value of releasing methodology, not just scores
- Michael frames open source as an accelerant for the next wave of agent research
๐ฃ This Week's Buzz: Weights & Biases Updates
Alex uses the traditional W&B interlude to plug Weavehacks, remind listeners what he does at Weights & Biases, and reset the show before the big-company segment.
- Weavehacks gets called out as the community event of the week
- The buzz segment acts as the bridge from open source into big-lab drama
๐ข Meta's Bold Moves in AI
The panel spends serious time on Meta Superintelligence Labs, treating the recruiting spree and rumored compensation packages as evidence that the AI talent market has broken into full wartime economics.
- Meta's dream-team assembly becomes the headline big-company story
- The discussion focuses on whether money alone can buy research momentum
๐ Apple's Collaboration with Anthropic
Alex briefly hits the rumor that Apple may be leaning on Anthropic rather than only internal model efforts, reading it as another sign that even huge incumbents are rethinking how they ship consumer AI.
- Apple is framed as increasingly open to outside model help
- The segment is short but reinforces the pressure on slow-moving incumbents
๐ผ Courser's Aggressive Hiring Spree
The talent-war theme continues with Cursor poaching key people behind Claude Code, reinforcing how quickly top AI product companies are consolidating engineering talent.
- Cursor is described as hiring aggressively from the strongest adjacent teams
- The move is treated as more than HR news: it is product-strategy news
๐งโ๐ป Cursor's Latest Developments
Beyond hiring, Cursor's web, mobile, and Slack agent rollout is presented as a glimpse of where coding tools are going: always-on, distributed, and not tied to a single editor window.
- Cursor agents expand onto web, mobile, and Slack
- The panel reads this as a sign that code agents are becoming ambient workflow software
๐ง New AI Models and OpenRouter
Alex briefly tours the free 1M-context Cypher Alpha listing on OpenRouter and uses it to highlight how model releases now increasingly arrive as product experiments, leaks, or market probes rather than tidy launches.
- Cypher Alpha is notable mostly because of its free 1M context window
- The segment captures the increasingly chaotic model-release environment
๐ Cloudflare's AI Bot Blocking Initiative
Cloudflare's one-click AI bot blocking feature becomes the short policy-and-platform detour of the episode, with Alex framing it as a direct response to the economics of perpetual web scraping.
- The story is about internet incentives as much as security controls
- The panel flags the tension between open research norms and commercial scraping
๐ฉบ Microsoft's Medical AI Breakthrough
Microsoft's MAI-DxO result is framed as one of the more meaningful application-level announcements of the week, because it suggests orchestration systems may outperform single models in high-stakes expert workflows.
- MAI-DxO is discussed as a systems win, not just a model win
- The story stands out because it connects model progress to real-world healthcare economics
๐๏ธ Interview with Ivan from Daytona
Ivan Burazin joins to explain Daytona's agent-native sandbox runtime, why 2025 is turning into the year of agent infrastructure, and how fast the company is growing as more builders realize agents need computers, not just APIs.
- Daytona is framed as critical "stateful serverless" infrastructure for agents
- The interview highlights fast growth, regional deployment, and startup credits as signs of demand
๐ฎ Mirage: The Future of AI-Generated Games
Mirage delivers the most visibly fun demo of the episode: Alex reacts in real time to an AI-native UGC game engine powered by world-model-style generation and treats it as a preview of where interactive media is headed.
- Mirage is pitched as an AI-native game engine rather than a one-off demo
- The discussion connects world models to end-user creative tooling
๐ Closing Remarks and TTS Releases
The episode lands on two voice-model releasesโKyutai TTS and Qwen-TTSโas Alex closes out on time and leaves listeners with one more reminder that multimodal product quality is rising everywhere at once.
- Kyutai and Qwen TTS are the final quick-hit launches of the show
- The close reinforces how broad the week's progress was across models, tools, infra, and media
Hereโs the quick rundown of everything we covered this week, packed with links to dive deeper:
Show Notes & Guests
Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-Hosts - @WolframRvnwlf, @yampeleg, @nisten, @ldjconfirmed
Guests - Ivan Burazin (Daytona), Michael Luo (Agentica)
Open Source LLMs
Baiduโs ERNIE 4.5 Series - 10 models, 424B to 0.3B, multimodal, beats o1 on DocVQA (X, HF, Paper)
Tencentโs Hunyuan-A13B-Instruct - 80B total, 13B active, 256K context, WizardLM legacy (X, HF, Try It)
Huaweiโs Pangu Pro MoE - 72B, trained on Ascend NPUs, 1,528 tokens/sec (X, HF)
DeepSWE-Preview - RL agent, 59% SWE-Bench-Verified on Qwen3-32B (Notion, HF Model)
This Weekโs Buzz
Weights & Biases Weavehacks Hackathon - SF, July 12-13, agent protocols focus (Sign Up)
Big CO LLMs + APIs
Meta Superintelligence Labs (MSL) - Zuck hires dream team, up to $300M comp packages from OpenAI talent (list)
Cursor - Hires Claude Code creators, web/mobile agents with Slack (Cursor, HF)
Microsoft MAI-DxO - 85.5% accuracy on NEJM cases vs. 20% for doctors (X, Blog)
Cloudflare - One-click AI bot blocking, tackles scraping economics (X)
Cypher Alpha - Mystery 1M context model, possibly Amazon Titan (Link)
Gemini Pro 2.5 - Returned to Googleโs free tier
Vision & Video
Mirage - AI-native UGC game engine, real-time photorealistic demos (Playable Demo)
Workflow - Restyle videos with Flux Kontext and Luma Modify (X)
Voice & Audio
Infrastructure
Tools
Chai Discoveryโs Chai-2 - Zero-shot antibody design (Chai Discovery)
Thanks for reading all the way through ThursdAI, folks! Share this with friends to spread the AI love, and Iโll catch you next week for more!