Episode Summary

Welcome back to ThursdAI! And folks, what an _absolutely insane_ week it's been in the world of AI. Seriously, as I mentioned on the show, we don't often get weeks _this_ packed with game-changing releases.

Hosts & Guests

Alex Volkov
Alex Volkov
Host ยท W&B / CoreWeave
@altryne
Tulsee Doshi
Tulsee Doshi
Senior Director & Head of Product, Gemini Models ยท Google DeepMind
@tulseedoshi
Morgan McQuire
Morgan McQuire
Engineer ยท Weights & Biases
@morgymcg
Prince Canuma
Prince Canuma
ML Developer & OSS Contributor ยท MLX Community
@Prince_Canuma
Nisten Tahiraj
Nisten Tahiraj
Weekly co-host of ThursdAI ยท AI operator & builder
@nisten
Yam Peleg
Yam Peleg
Weekly co-host of ThursdAI ยท AI builder & founder
@Yampeleg
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator ยท Independent AI evaluator (r/LocalLLaMA)
@WolframRvnwlf

By The Numbers

Big CO LLMs + APIs
2.5
Google came out swinging this week, dropping Gemini 2.5 Pro and, based on the benchmarks and our initial impressions, taking back the crown for the best all-around LLM currently available.
Big CO LLMs + APIs
20
We saw massive jumps on benchmarks like AIME (up nearly 20 points!) and GPQA.
Big CO LLMs + APIs
13
My own testing on reasoning tasks confirms this โ€“ the latency is surprisingly low for such a powerful model (around 13 seconds on my hard reasoning questions compared to 45+ for others), and the accuracy is the highest I've seen yet at 66% on that specific challenging set.
Big CO LLMs + APIs
1M
It also inherits the strengths of previous Gemini models โ€“ native multimodality and that massive long context window (up to 1M tokens!).
Big CO LLMs + APIs
120
The performance on long context tasks, like the needle-in-a-haystack test shown on Live Bench, is truly impressive, maintaining high accuracy even at 120k+ tokens where other models often falter significantly.

๐Ÿ”“ Big CO LLMs + APIs

Okay, let's start with the big news. Google came out swinging this week, dropping Gemini 2.5 Pro and, based on the benchmarks and our initial impressions, taking back the crown for the best all-around LLM currently available.

  • Okay, let's start with the big news.
  • (Check out the X announcement, the [official blog post]( and seriously, go [try it yourself at ai.dev](
  • We were super lucky to have Tulsee Doshi, who leads the product team for Gemini modeling efforts at Google, join us on the show to give us the inside scoop.

๐Ÿ“ฐ GPT-4o got another update (as I'm writing these words!) tied for #1 on LMArena, beating 4.5

How much does Sam want to win over Google? So much he's letting it ALL out. Just now, we saw an update from LMArena and Sam, about a NEW GPT-4o (2025-03-26) which jumps OVER GPT 4.5 (like..

  • How much does Sam want to win over Google?
  • So much he's letting it ALL out.
  • Just now, we saw an update from LMArena and Sam, about a NEW GPT-4o (2025-03-26) which jumps OVER GPT 4.5 (like..

๐Ÿ”“ Open Source LLMs

The open-source community wasn't sleeping this week either, with some major drops! The Whale Bros at DeepSeek silently dropped an update to their V3 model (X architecture.

  • The open-source community wasn't sleeping this week either, with some major drops!
  • The Whale Bros at DeepSeek silently dropped an update to their V3 model (X architecture.
  • This isn't R1 (their reasoning model), but the powerful base model that R1 was built upon (and supposedly R2 when it'll come out)

๐ŸŽจ AI Art & Diffusion & Auto-regression

This was arguably where the biggest "mainstream" buzz happened this week, thanks mainly to OpenAI. This felt like a direct response to Gemini 2.5's launch, almost like OpenAI saying, "Oh yeah? Watch this!" They _finally_ enabled the native image generation capabilities within GPT-4o (Blog, Examples).

  • This was arguably where the biggest "mainstream" buzz happened this week, thanks mainly to OpenAI.
  • This felt like a direct response to Gemini 2.5's launch, almost like OpenAI saying, "Oh yeah?
  • Watch this!" They _finally_ enabled the native image generation capabilities within GPT-4o (Blog, Examples).

๐Ÿค– This Week's Buzz + MCP ([X]( [Github](

Bringing it back to Weights & Biases for a moment. We had Morgan McQuire on the show, who heads up our AI Applied team, to talk about something we're really excited about internally โ€“ integrating MCP with Weave, our LLM observability and evaluation tool. Morgan showed a demo and have shipped the MCP server, which you can try right now!

  • Bringing it back to Weights & Biases for a moment.
  • Morgan showed a demo and have shipped the MCP server, which you can try right now!
  • Coming soon is the integration with wandb models, which will allows ML folks around the world to build agents that monitor loss curves for them!

๐Ÿค– Agents, Tools & MCP

And speaking of MCP... This was HUGE news, maybe slightly overshadowed by the image generation, but potentially far more impactful long-term, as Wolfram pointed out right at the start of the show. OpenAI officially announced support for the Model Context Protocol (MCP) ([docs here]( Why is this massive?

  • This was HUGE news, maybe slightly overshadowed by the image generation, but potentially far more impactful long-term, as Wolfram pointed out right at the start of the show.
  • OpenAI officially announced support for the Model Context Protocol (MCP) ([docs here](
  • HD DVD โ€“ standards wars suck!).

๐Ÿ”Š Voice & Audio

Just one more quick update on the audio front: Alongside the image generation, OpenAI also quietly updated the advanced voice mode in ChatGPT (YT announcement.

  • Just one more quick update on the audio front:
  • Alongside the image generation, OpenAI also quietly updated the advanced voice mode in ChatGPT (YT announcement.
  • This should lead to a much more natural conversation flow.

๐Ÿ”Š MLX-Audio

And speaking (heh) of audio and speech, we had the awesome Prince Canuma, you probably know Prince. He's the MLX King, the creator and maintainer of essential libraries like MLX-VLM (for vision models), FastMLX, MLX Embeddings, and now, MLX-Audio.

  • And speaking (heh) of audio and speech, we had the awesome Prince Canuma, you probably know Prince.
  • He's the MLX King, the creator and maintainer of essential libraries like MLX-VLM (for vision models), FastMLX, MLX Embeddings, and now, MLX-Audio.
  • Seriously, huge props to Prince and the folks in the MLX community for making these powerful open-source models accessible on Mac hardware.
TL;DR and Show Notes:
  • Guests and Cohosts

    • Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
      Co Hosts - Wolfram Ravenwlf (@WolframRvnwlf), Nisten Tahiraj (@nisten), Yam Peleg (@yampeleg)

    • Tulsee Doshi - Head of Product, Gemini Models at Google DeepMind (@tulseedoshi)

    • Morgan McQuire - Head of AI Applied Team at Weights & Biases (@morgymcg)

    • Prince Canuma - ML Research Engineer, Creator of MLX Libraries (@PrinceCanuma)

  • Big CO LLMs + APIs

    • ๐Ÿ”ฅ Google reclaims #1 position with Gemini 2.5 Pro (thinking) - (X, Blog, Try it)

    • ARC-AGI 2 benchmark revealed - Base LLMs score 0%, thinking models 4%.

  • Open Source LLMs

    • Deepseek updates DeepSeek-V3-0324 685B params (X, HF) - MIT License!

    • Qwen launches an Omni 7B model - perceives text, image, audio, video & generates text and speech (HF)

  • AI Art & Diffusion & Auto-regression

  • This weeks Buzz + MCP

    • Weights & Biases Weave official MCP server tool - talk to your evals! (X, Github)

  • Agents , Tools & MCP

    • OpenAI has added support for MCP - MCP WON! (Docs)

  • Voice & Audio

    • OpenAI updates advanced voice mode with semantic VAD for more natural conversations (YT announcement).

    • MLX-Audio v0.0.3 released by Prince Canuma (Github)

  • Show Notes and other Links

    • Catch the show live & subscribe to the newsletter/YouTube: thursdai.news/yt

    • Try Gemini 2.5 Pro: AI.dev

    • Learn more about MCP from our previous episode (March 6th).