Episode Summary

Woo! Welcome back to ThursdAI, show number 99! Can you believe it? We are _one_ show away from hitting the big 100, which is just wild to me.

Hosts & Guests

Alex Volkov
Alex Volkov
Host · W&B / CoreWeave
@altryne
Xingyao Wang
Xingyao Wang
PhD Researcher · University of Illinois Urbana-Champaign (UIUC)
@xingyaow_
Cong Wei
Cong Wei
AI Researcher · Meta GenAI / University of Waterloo
@CongWei1230
Zach Nussbaum
Zach Nussbaum
Machine Learning Engineer · Nomic AI
@zach_nussbaum
LDJ
LDJ
Weekly co-host of ThursdAI · Nous Research
@ldjconfirmed
Yam Peleg
Yam Peleg
Weekly co-host of ThursdAI · AI builder & founder
@Yampeleg

By The Numbers

OpenAI Makes Waves: Open Source Tease, Tough Evals &
700M
Sam Altman also cheekily added they won't slap on a Llama-style <700M user license limit.
OpenAI Makes Waves: Open Source Tease, Tough Evals &
8
It's incredibly detailed (>8,300 tasks) and even includes meta-evaluation for the LLM judge they built (Nano-Eval framework also open sourced came out on top with just 21.0% replication score (human PhDs got 41.4%).
OpenAI Makes Waves: Open Source Tease, Tough Evals &
40 B
You can find the [code on GitHub]( and read the [full paper here]( Third, the casual 40 Billion Dollars thanks to native image generation, especially seeing huge growth in India.
Nomic Embed Multimodal: SOTA Embeddings for Visual D
3B
We had Zach Nussbaum on the show discussing [Nomic Embed Multimodal]( These are new 3B & 7B parameter embedding models ([available on Hugging Face]( built on Alibaba's excellent Qwen2.5-VL.
Nomic Embed Multimodal: SOTA Embeddings for Visual D
7B
Importantly, the 7B model comes with an Apache 2.0 license, and they've open sourced weights, code, and data.

🔥 Breaking During The Show

Tool Update & Breaking News!
1. Google's NotebookLM now discovers related sources: Devin 2.0 is out!

🔓 OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised

It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles. First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"! Kevin Weil tweeted to "get this right." Word on the street is that this could be a powerful reasoning model.

  • It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles.
  • First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"!
  • Kevin Weil tweeted to "get this right." Word on the street is that this could be a powerful reasoning model.

🔓 Open Source Powerhouses: Nomic & OpenHands Deliver SOTA

Beyond the OpenAI buzz, the open source community delivered some absolute gems, and we had guests from two key projects join us!

📰 Nomic Embed Multimodal: SOTA Embeddings for Visual Docs

Our friends at Nomic AI are back with a killer release! We had Zach Nussbaum on the show discussing [Nomic Embed Multimodal]( These are new 3B & 7B parameter embedding models ([available on Hugging Face]( built on Alibaba's excellent Qwen2.5-VL.

  • Our friends at Nomic AI are back with a killer release!
  • They achieved SOTA on visual document retrieval by cleverly embedding interleaved text-image sequences – perfect for PDFs and complex webpages.
  • Zach highlighted that they chose the Qwen base because high-performing open VLMs under 3B params are still scarce, making it a solid foundation.

🤖 OpenHands LM 32B & Agent: Accessible SOTA Coding

Remember OpenDevin?

  • It hits a remarkable 37.2% on SWE-Bench Verified (a coding benchmark measuring real-world repo tasks), competing with much larger models.
  • This focus seems to be paying off, as the OpenHands _agent_ also snagged the #2 spot on the brand new Live SWE-Bench leaderboard!
  • Plus, the 32B model runs locally on a single 3090, making this power accessible.

🎨 Frontiers: Diffusion LMs & Superhuman Math

Two other developments pushed the boundaries this week:

🎨 Dream 7B: A Diffusion Language Model Challenger?

This one's fascinating conceptually. Researchers unveiled Dream 7B, potentially due to its parallel processing nature being better for global constraints. It's an exciting hint at alternative architectures, but the model weights aren't out yet, so we can't verify or play with it.

  • This one's fascinating conceptually.
  • Researchers unveiled Dream 7B, potentially due to its parallel processing nature being better for global constraints.
  • It's an exciting hint at alternative architectures, but the model weights aren't out yet, so we can't verify or play with it.

📰 Gemini 2.5 Obliterates Olympiad Math (24.4% on USAMO!)

We already knew Gemini 2.5 was good, but wow. New results dropped showing its performance on the USA Math Olympiad (USAMO) – problems so hard most top models score under 5%. Gemini 2.5 Pro scored an incredible 24.4%!

  • We already knew Gemini 2.5 was good, but wow.
  • New results dropped showing its performance on the USA Math Olympiad (USAMO) – problems so hard most top models score under 5%.
  • Gemini 2.5 Pro scored an incredible 24.4%!

🤖 Amazon's Nova Act Agent & The Need for Access

Amazon entered the agent chat with [Nova Act]( designed for web browser actions. They claim it beats Claude 3.5 and OpenAI's QA model on some benchmarks, possibly leveraging acquired Adept talent. But...

  • Amazon entered the agent chat with [Nova Act]( designed for web browser actions.
  • They claim it beats Claude 3.5 and OpenAI's QA model on some benchmarks, possibly leveraging acquired Adept talent.
  • it's only available via an SDK with a request form.

📰 CoreWeave + NVIDIA = Insane Speeds

Hardware keeps accelerating. CoreWeave announced hitting [800 Tokens/sec on Llama 3.1 405B]( using NVIDIA's new GB200 Blackwell chips, and 33,000 T/s on Llama 2 70B with H200s. Inference is getting _fast_.

  • CoreWeave announced hitting [800 Tokens/sec on Llama 3.1 405B]( using NVIDIA's new GB200 Blackwell chips, and 33,000 T/s on Llama 2 70B with H200s.

🤖 This Week's Buzz: Let's Make MCP Observable!

Okay, my personal mission this week builds on the growing Model Context Protocol (MCP) momentum. MCP is potentially the "HTTP for agents," enabling tool interoperability. But as tool use moves external, we lose visibility, making debugging and security harder.

  • Okay, my personal mission this week builds on the growing Model Context Protocol (MCP) momentum.
  • MCP is potentially the "HTTP for agents," enabling tool interoperability.
  • But as tool use moves external, we lose visibility, making debugging and security harder.

🎥 Vision & Video: Entering the Uncanny Valley

This space is moving at lightning speed. [Runway Gen-4]( was announced, pushing for better consistency in AI video. Here's a few example videos showing incredible character and world consistency: TK: Runway Video ByteDance's impressive OmniHuman is now publicly usable via Dreamina website.

  • This space is moving at lightning speed.
  • [Runway Gen-4]( was announced, pushing for better consistency in AI video.
  • Here's a few example videos showing incredible character and world consistency:

🔊 Voice Highlight: Hailuo Speech-02

While Gladia launched their [Solaria STT]( the standout for me was [Hailuo's Speech-02 TTS API]( The emotional control and voice cloning quality are, in my opinion, potentially SOTA right now, offering incredibly nuanced and realistic synthetic voices.

🤖 Tool Update & Breaking News!

1. Google's NotebookLM now discovers related sources: Devin 2.0 is out! Cognition Labs launched their AI software engineer V2 with a new IDE experience and, crucially, a $20/month starting price.

  • Google's NotebookLM now discovers related sources: Devin 2.0 is out!
  • Cognition Labs launched their AI software engineer V2 with a new IDE experience and, crucially, a $20/month starting price.
  • From OpenAI's big moves to Gemini's math prowess, stunning AI actors from Meta, and the push for an observable agent ecosystem – the field is accelerating like crazy.
TL;DR and Show Notes

Host, Guests, and Co-hosts

  1. Host: Alex Volkov - AI Evangelist & Weights & Biases (@altryne)

  2. Co-Hosts:

    1. LDJ (@ldjconfirmed)

    2. Yam Peleg (@yampeleg)

  3. Guests:

    1. Zach Nussbaum (@zach_nussbaum) - Nomic AI

    2. Xingyao Wang (@xingyaow_) - All Hands AI / OpenHands

    3. Cong Wei (@CongWei1230) - Meta AI / MoCha

Key Topics & Links

  1. OpenAI's Big Week:

    1. Teasing highly capable Open Source Reasoner Model (seeking feedback).

    2. Released PaperBench eval (code, paper) & Nano-Eval framework.

    3. Raised $40B at $300B valuation.

    4. New EMO "Monday" voice in ChatGPT.

  2. Open Source Powerhouses:

    1. Nomic Embed Multimodal: SOTA visual doc embeddings (3B & 7B, Apache 2.0 for 7B).

    2. OpenHands LM 32B: SOTA-level coding agent model (Qwen finetune, MIT License, 37.2% SWE-Bench, #2 Live SWE-Bench). Cloud version available.

  3. Frontier Models & Capabilities:

    1. Dream 7B: Promising diffusion LM shows strong benchmark results (esp. Sudoku), but weights not yet released.

    2. Gemini 2.5: Crushes hard USAMO math eval (24.4% vs <5% for others), showcasing superior reasoning.

  4. Agents & Compute:

    1. Amazon's Nova Act agent announced, claims SOTA but lacks public access (request form).

    2. CoreWeave/NVIDIA: Massive inference speedups (800T/s on Llama 405B with GB200).

  5. This Week's Buzz - MCP:

    1. Observable Tools initiative launched to add observability to MCP.

    2. Proposal using OpenTelemetry posted for community feedback on GitHub - please support!

    3. Huge demand shown for usable MCP clients (viral tweet).

  6. Vision & Video Highlights:

    1. Runway Gen-4 focuses on video consistency.

    2. ByteDance OmniHuman (image-to-avatar) now publicly available via Dreamina (example thread).

    3. Meta's MoCHA: Generates stunningly realistic, movie-grade talking characters from speech+text.

  7. Voice Highlight:

    1. Hailuo Speech-02: Impressive TTS API with excellent emotional control and voice cloning.

  8. Tool Updates:

    1. Windsurf adds deployments to Netlify.

    2. Google NotebookLM adds source discovery.

  9. Breaking News:

    1. Devin 2.0 AI Software Engineer announced, starts at $20/month.