Episode Summary
Woo! Welcome back to ThursdAI, show number 99! Can you believe it? We are _one_ show away from hitting the big 100, which is just wild to me.
In This Episode
- 🔓 OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised
- 🔓 Open Source Powerhouses: Nomic & OpenHands Deliver SOTA
- 📰 Nomic Embed Multimodal: SOTA Embeddings for Visual Docs
- 🤖 OpenHands LM 32B & Agent: Accessible SOTA Coding
- 🎨 Frontiers: Diffusion LMs & Superhuman Math
- 🎨 Dream 7B: A Diffusion Language Model Challenger?
- 📰 Gemini 2.5 Obliterates Olympiad Math (24.4% on USAMO!)
- 🤖 Amazon's Nova Act Agent & The Need for Access
- 📰 CoreWeave + NVIDIA = Insane Speeds
- 🤖 This Week's Buzz: Let's Make MCP Observable!
- 🎥 Vision & Video: Entering the Uncanny Valley
- 🔊 Voice Highlight: Hailuo Speech-02
- 🤖 Tool Update & Breaking News!
Hosts & Guests
By The Numbers
🔥 Breaking During The Show
🔓 OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised
It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles. First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"! Kevin Weil tweeted to "get this right." Word on the street is that this could be a powerful reasoning model.
- It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles.
- First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"!
- Kevin Weil tweeted to "get this right." Word on the street is that this could be a powerful reasoning model.
🔓 Open Source Powerhouses: Nomic & OpenHands Deliver SOTA
Beyond the OpenAI buzz, the open source community delivered some absolute gems, and we had guests from two key projects join us!
📰 Nomic Embed Multimodal: SOTA Embeddings for Visual Docs
Our friends at Nomic AI are back with a killer release! We had Zach Nussbaum on the show discussing [Nomic Embed Multimodal]( These are new 3B & 7B parameter embedding models ([available on Hugging Face]( built on Alibaba's excellent Qwen2.5-VL.
- Our friends at Nomic AI are back with a killer release!
- They achieved SOTA on visual document retrieval by cleverly embedding interleaved text-image sequences – perfect for PDFs and complex webpages.
- Zach highlighted that they chose the Qwen base because high-performing open VLMs under 3B params are still scarce, making it a solid foundation.
🤖 OpenHands LM 32B & Agent: Accessible SOTA Coding
Remember OpenDevin?
- It hits a remarkable 37.2% on SWE-Bench Verified (a coding benchmark measuring real-world repo tasks), competing with much larger models.
- This focus seems to be paying off, as the OpenHands _agent_ also snagged the #2 spot on the brand new Live SWE-Bench leaderboard!
- Plus, the 32B model runs locally on a single 3090, making this power accessible.
🎨 Frontiers: Diffusion LMs & Superhuman Math
Two other developments pushed the boundaries this week:
🎨 Dream 7B: A Diffusion Language Model Challenger?
This one's fascinating conceptually. Researchers unveiled Dream 7B, potentially due to its parallel processing nature being better for global constraints. It's an exciting hint at alternative architectures, but the model weights aren't out yet, so we can't verify or play with it.
- This one's fascinating conceptually.
- Researchers unveiled Dream 7B, potentially due to its parallel processing nature being better for global constraints.
- It's an exciting hint at alternative architectures, but the model weights aren't out yet, so we can't verify or play with it.
📰 Gemini 2.5 Obliterates Olympiad Math (24.4% on USAMO!)
We already knew Gemini 2.5 was good, but wow. New results dropped showing its performance on the USA Math Olympiad (USAMO) – problems so hard most top models score under 5%. Gemini 2.5 Pro scored an incredible 24.4%!
- We already knew Gemini 2.5 was good, but wow.
- New results dropped showing its performance on the USA Math Olympiad (USAMO) – problems so hard most top models score under 5%.
- Gemini 2.5 Pro scored an incredible 24.4%!
🤖 Amazon's Nova Act Agent & The Need for Access
Amazon entered the agent chat with [Nova Act]( designed for web browser actions. They claim it beats Claude 3.5 and OpenAI's QA model on some benchmarks, possibly leveraging acquired Adept talent. But...
- Amazon entered the agent chat with [Nova Act]( designed for web browser actions.
- They claim it beats Claude 3.5 and OpenAI's QA model on some benchmarks, possibly leveraging acquired Adept talent.
- it's only available via an SDK with a request form.
📰 CoreWeave + NVIDIA = Insane Speeds
Hardware keeps accelerating. CoreWeave announced hitting [800 Tokens/sec on Llama 3.1 405B]( using NVIDIA's new GB200 Blackwell chips, and 33,000 T/s on Llama 2 70B with H200s. Inference is getting _fast_.
- CoreWeave announced hitting [800 Tokens/sec on Llama 3.1 405B]( using NVIDIA's new GB200 Blackwell chips, and 33,000 T/s on Llama 2 70B with H200s.
🤖 This Week's Buzz: Let's Make MCP Observable!
Okay, my personal mission this week builds on the growing Model Context Protocol (MCP) momentum. MCP is potentially the "HTTP for agents," enabling tool interoperability. But as tool use moves external, we lose visibility, making debugging and security harder.
- Okay, my personal mission this week builds on the growing Model Context Protocol (MCP) momentum.
- MCP is potentially the "HTTP for agents," enabling tool interoperability.
- But as tool use moves external, we lose visibility, making debugging and security harder.
🎥 Vision & Video: Entering the Uncanny Valley
This space is moving at lightning speed. [Runway Gen-4]( was announced, pushing for better consistency in AI video. Here's a few example videos showing incredible character and world consistency: TK: Runway Video ByteDance's impressive OmniHuman is now publicly usable via Dreamina website.
- This space is moving at lightning speed.
- [Runway Gen-4]( was announced, pushing for better consistency in AI video.
- Here's a few example videos showing incredible character and world consistency:
🔊 Voice Highlight: Hailuo Speech-02
While Gladia launched their [Solaria STT]( the standout for me was [Hailuo's Speech-02 TTS API]( The emotional control and voice cloning quality are, in my opinion, potentially SOTA right now, offering incredibly nuanced and realistic synthetic voices.
🤖 Tool Update & Breaking News!
1. Google's NotebookLM now discovers related sources: Devin 2.0 is out! Cognition Labs launched their AI software engineer V2 with a new IDE experience and, crucially, a $20/month starting price.
- Google's NotebookLM now discovers related sources: Devin 2.0 is out!
- Cognition Labs launched their AI software engineer V2 with a new IDE experience and, crucially, a $20/month starting price.
- From OpenAI's big moves to Gemini's math prowess, stunning AI actors from Meta, and the push for an observable agent ecosystem – the field is accelerating like crazy.
Host, Guests, and Co-hosts
Host: Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-Hosts:
LDJ (@ldjconfirmed)
Yam Peleg (@yampeleg)
Guests:
Zach Nussbaum (@zach_nussbaum) - Nomic AI
Xingyao Wang (@xingyaow_) - All Hands AI / OpenHands
Cong Wei (@CongWei1230) - Meta AI / MoCha
Key Topics & Links
OpenAI's Big Week:
Teasing highly capable Open Source Reasoner Model (seeking feedback).
Released PaperBench eval (code, paper) & Nano-Eval framework.
Raised $40B at $300B valuation.
New EMO "Monday" voice in ChatGPT.
Open Source Powerhouses:
Nomic Embed Multimodal: SOTA visual doc embeddings (3B & 7B, Apache 2.0 for 7B).
OpenHands LM 32B: SOTA-level coding agent model (Qwen finetune, MIT License, 37.2% SWE-Bench, #2 Live SWE-Bench). Cloud version available.
Frontier Models & Capabilities:
Dream 7B: Promising diffusion LM shows strong benchmark results (esp. Sudoku), but weights not yet released.
Gemini 2.5: Crushes hard USAMO math eval (24.4% vs <5% for others), showcasing superior reasoning.
Agents & Compute:
Amazon's Nova Act agent announced, claims SOTA but lacks public access (request form).
CoreWeave/NVIDIA: Massive inference speedups (800T/s on Llama 405B with GB200).
This Week's Buzz - MCP:
Observable Tools initiative launched to add observability to MCP.
Proposal using OpenTelemetry posted for community feedback on GitHub - please support!
Huge demand shown for usable MCP clients (viral tweet).
Vision & Video Highlights:
Runway Gen-4 focuses on video consistency.
ByteDance OmniHuman (image-to-avatar) now publicly available via Dreamina (example thread).
Meta's MoCHA: Generates stunningly realistic, movie-grade talking characters from speech+text.
Voice Highlight:
Hailuo Speech-02: Impressive TTS API with excellent emotional control and voice cloning.
Tool Updates:
Windsurf adds deployments to Netlify.
Google NotebookLM adds source discovery.
Breaking News:
Devin 2.0 AI Software Engineer announced, starts at $20/month.