ThursdAI · April 3, 2025

ThursdAI - Apr 3rd - OpenAI Goes Open?! Gemini Crushes Math, AI Actors Go Hollywood & MCP, Now with Observability?

From Weights & Biases - an incredible show with 3 guests (Nomic, All Hands and Meta), OpenAI open sourcing soon, AI Video is getting too realistic + announcing Observable.tools initiative from W&B!

By Alex Volkov

98 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of April 3, 2025?

Woo! Welcome back to ThursdAI, show number 99! Can you believe it? We are _one_ show away from hitting the big 100, which is just wild to me.

OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised
Open Source Powerhouses: Nomic & OpenHands Deliver SOTA
Nomic Embed Multimodal: SOTA Embeddings for Visual Docs
OpenHands LM 32B & Agent: Accessible SOTA Coding
Frontiers: Diffusion LMs & Superhuman Math
Dream 7B: A Diffusion Language Model Challenger?

Episode Summary

Woo! Welcome back to ThursdAI, show number 99! Can you believe it? We are _one_ show away from hitting the big 100, which is just wild to me.

In This Episode

🔓 OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised
🔓 Open Source Powerhouses: Nomic & OpenHands Deliver SOTA
📰 Nomic Embed Multimodal: SOTA Embeddings for Visual Docs
🤖 OpenHands LM 32B & Agent: Accessible SOTA Coding
🎨 Frontiers: Diffusion LMs & Superhuman Math
🎨 Dream 7B: A Diffusion Language Model Challenger?
📰 Gemini 2.5 Obliterates Olympiad Math (24.4% on USAMO!)
🤖 Amazon's Nova Act Agent & The Need for Access
📰 CoreWeave + NVIDIA = Insane Speeds
🤖 This Week's Buzz: Let's Make MCP Observable!
🎥 Vision & Video: Entering the Uncanny Valley
🔊 Voice Highlight: Hailuo Speech-02
🤖 Tool Update & Breaking News!

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Xingyao Wang

PhD Researcher · University of Illinois Urbana-Champaign (UIUC)

@xingyaow_

Cong Wei

AI Researcher · Meta GenAI / University of Waterloo

@CongWei1230

Zach Nussbaum

Machine Learning Engineer · Nomic AI

@zach_nussbaum

LDJ

Weekly co-host of ThursdAI · Nous Research

@ldjconfirmed

Yam Peleg

Weekly co-host of ThursdAI · AI builder & founder

@Yampeleg

By The Numbers

OpenAI Makes Waves: Open Source Tease, Tough Evals &

700M

Sam Altman also cheekily added they won't slap on a Llama-style <700M user license limit.

OpenAI Makes Waves: Open Source Tease, Tough Evals &

It's incredibly detailed (>8,300 tasks) and even includes meta-evaluation for the LLM judge they built (Nano-Eval framework also open sourced came out on top with just 21.0% replication score (human PhDs got 41.4%).

OpenAI Makes Waves: Open Source Tease, Tough Evals &

40 B

You can find the [code on GitHub]( and read the [full paper here]( Third, the casual 40 Billion Dollars thanks to native image generation, especially seeing huge growth in India.

Nomic Embed Multimodal: SOTA Embeddings for Visual D

We had Zach Nussbaum on the show discussing [Nomic Embed Multimodal]( These are new 3B & 7B parameter embedding models ([available on Hugging Face]( built on Alibaba's excellent Qwen2.5-VL.

Nomic Embed Multimodal: SOTA Embeddings for Visual D

Importantly, the 7B model comes with an Apache 2.0 license, and they've open sourced weights, code, and data.

🔥 Breaking During The Show

Tool Update & Breaking News!

1. Google's NotebookLM now discovers related sources: Devin 2.0 is out!

🔓 OpenAI Makes Waves: Open Source Tease, Tough Evals & Billions Raised

It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles. First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"! Kevin Weil tweeted to "get this right." Word on the street is that this could be a powerful reasoning model.

It feels like OpenAI was determined to dominate the headlines this week, hitting us from multiple angles.
First, the potentially massive news: OpenAI is planning to release a new open source model in the "coming months"!
Kevin Weil tweeted to "get this right." Word on the street is that this could be a powerful reasoning model.

🔓 Open Source Powerhouses: Nomic & OpenHands Deliver SOTA

Beyond the OpenAI buzz, the open source community delivered some absolute gems, and we had guests from two key projects join us!

📰 Nomic Embed Multimodal: SOTA Embeddings for Visual Docs

Our friends at Nomic AI are back with a killer release! We had Zach Nussbaum on the show discussing [Nomic Embed Multimodal]( These are new 3B & 7B parameter embedding models ([available on Hugging Face]( built on Alibaba's excellent Qwen2.5-VL.

Our friends at Nomic AI are back with a killer release!
They achieved SOTA on visual document retrieval by cleverly embedding interleaved text-image sequences – perfect for PDFs and complex webpages.
Zach highlighted that they chose the Qwen base because high-performing open VLMs under 3B params are still scarce, making it a solid foundation.

🤖 OpenHands LM 32B & Agent: Accessible SOTA Coding

Remember OpenDevin?

It hits a remarkable 37.2% on SWE-Bench Verified (a coding benchmark measuring real-world repo tasks), competing with much larger models.
This focus seems to be paying off, as the OpenHands _agent_ also snagged the #2 spot on the brand new Live SWE-Bench leaderboard!
Plus, the 32B model runs locally on a single 3090, making this power accessible.

🎨 Frontiers: Diffusion LMs & Superhuman Math

Two other developments pushed the boundaries this week:

🎨 Dream 7B: A Diffusion Language Model Challenger?

This one's fascinating conceptually. Researchers unveiled Dream 7B, potentially due to its parallel processing nature being better for global constraints. It's an exciting hint at alternative architectures, but the model weights aren't out yet, so we can't verify or play with it.

This one's fascinating conceptually.
Researchers unveiled Dream 7B, potentially due to its parallel processing nature being better for global constraints.
It's an exciting hint at alternative architectures, but the model weights aren't out yet, so we can't verify or play with it.

📰 Gemini 2.5 Obliterates Olympiad Math (24.4% on USAMO!)

We already knew Gemini 2.5 was good, but wow. New results dropped showing its performance on the USA Math Olympiad (USAMO) – problems so hard most top models score under 5%. Gemini 2.5 Pro scored an incredible 24.4%!

We already knew Gemini 2.5 was good, but wow.
New results dropped showing its performance on the USA Math Olympiad (USAMO) – problems so hard most top models score under 5%.
Gemini 2.5 Pro scored an incredible 24.4%!

🤖 Amazon's Nova Act Agent & The Need for Access

Amazon entered the agent chat with [Nova Act]( designed for web browser actions. They claim it beats Claude 3.5 and OpenAI's QA model on some benchmarks, possibly leveraging acquired Adept talent. But...

Amazon entered the agent chat with [Nova Act]( designed for web browser actions.
They claim it beats Claude 3.5 and OpenAI's QA model on some benchmarks, possibly leveraging acquired Adept talent.
it's only available via an SDK with a request form.

📰 CoreWeave + NVIDIA = Insane Speeds

Hardware keeps accelerating. CoreWeave announced hitting [800 Tokens/sec on Llama 3.1 405B]( using NVIDIA's new GB200 Blackwell chips, and 33,000 T/s on Llama 2 70B with H200s. Inference is getting _fast_.

CoreWeave announced hitting [800 Tokens/sec on Llama 3.1 405B]( using NVIDIA's new GB200 Blackwell chips, and 33,000 T/s on Llama 2 70B with H200s.

🤖 This Week's Buzz: Let's Make MCP Observable!

Okay, my personal mission this week builds on the growing Model Context Protocol (MCP) momentum. MCP is potentially the "HTTP for agents," enabling tool interoperability. But as tool use moves external, we lose visibility, making debugging and security harder.

Okay, my personal mission this week builds on the growing Model Context Protocol (MCP) momentum.
MCP is potentially the "HTTP for agents," enabling tool interoperability.
But as tool use moves external, we lose visibility, making debugging and security harder.

🎥 Vision & Video: Entering the Uncanny Valley

This space is moving at lightning speed. [Runway Gen-4]( was announced, pushing for better consistency in AI video. Here's a few example videos showing incredible character and world consistency: TK: Runway Video ByteDance's impressive OmniHuman is now publicly usable via Dreamina website.

This space is moving at lightning speed.
[Runway Gen-4]( was announced, pushing for better consistency in AI video.
Here's a few example videos showing incredible character and world consistency:

🔊 Voice Highlight: Hailuo Speech-02

While Gladia launched their [Solaria STT]( the standout for me was [Hailuo's Speech-02 TTS API]( The emotional control and voice cloning quality are, in my opinion, potentially SOTA right now, offering incredibly nuanced and realistic synthetic voices.

🤖 Tool Update & Breaking News!

1. Google's NotebookLM now discovers related sources: Devin 2.0 is out! Cognition Labs launched their AI software engineer V2 with a new IDE experience and, crucially, a $20/month starting price.

Google's NotebookLM now discovers related sources: Devin 2.0 is out!
Cognition Labs launched their AI software engineer V2 with a new IDE experience and, crucially, a $20/month starting price.
From OpenAI's big moves to Gemini's math prowess, stunning AI actors from Meta, and the push for an observable agent ecosystem – the field is accelerating like crazy.

TL;DR and Show Notes

Host, Guests, and Co-hosts

Host: Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co-Hosts:
1. LDJ (@ldjconfirmed)
2. Yam Peleg (@yampeleg)
Guests:
1. Zach Nussbaum (@zach_nussbaum) - Nomic AI
2. Xingyao Wang (@xingyaow_) - All Hands AI / OpenHands
3. Cong Wei (@CongWei1230) - Meta AI / MoCha

Key Topics & Links

OpenAI's Big Week:
1. Teasing highly capable Open Source Reasoner Model (seeking feedback).
2. Released PaperBench eval (code, paper) & Nano-Eval framework.
3. Raised $40B at $300B valuation.
4. New EMO "Monday" voice in ChatGPT.
Open Source Powerhouses:
1. Nomic Embed Multimodal: SOTA visual doc embeddings (3B & 7B, Apache 2.0 for 7B).
2. OpenHands LM 32B: SOTA-level coding agent model (Qwen finetune, MIT License, 37.2% SWE-Bench, #2 Live SWE-Bench). Cloud version available.
Frontier Models & Capabilities:
1. Dream 7B: Promising diffusion LM shows strong benchmark results (esp. Sudoku), but weights not yet released.
2. Gemini 2.5: Crushes hard USAMO math eval (24.4% vs <5% for others), showcasing superior reasoning.
Agents & Compute:
1. Amazon's Nova Act agent announced, claims SOTA but lacks public access (request form).
2. CoreWeave/NVIDIA: Massive inference speedups (800T/s on Llama 405B with GB200).
This Week's Buzz - MCP:
1. Observable Tools initiative launched to add observability to MCP.
2. Proposal using OpenTelemetry posted for community feedback on GitHub - please support!
3. Huge demand shown for usable MCP clients (viral tweet).
Vision & Video Highlights:
1. Runway Gen-4 focuses on video consistency.
2. ByteDance OmniHuman (image-to-avatar) now publicly available via Dreamina (example thread).
3. Meta's MoCHA: Generates stunningly realistic, movie-grade talking characters from speech+text.
Voice Highlight:
1. Hailuo Speech-02: Impressive TTS API with excellent emotional control and voice cloning.
Tool Updates:
1. Windsurf adds deployments to Netlify.
2. Google NotebookLM adds source discovery.
Breaking News:
1. Devin 2.0 AI Software Engineer announced, starts at $20/month.

Monday 0:00

Alright, fine.

0:01

Here we go. Welcome to Thursday ai. I'm your Oso enthusiastic host, Alex here to bring you the latest in AI today. We've got some thrilling or not news for you from open AI's announcements to whatever else is on the agenda. So grab your coffee or don't, and let's get this over with.

Alex Volkov 0:43

Woo.

0:44

Welcome to ThursdAI everyone welcome, welcome. ThursdAI April 3rd, not April 1st. Nobody to be fooled. April 3rd. Welcome folks. My name is Alex Volkov as always, I'm an evangelist with weights and biases. I'm your host for today and, there's so much to cover today that hopefully we'll get through all of it. But there's a bunch of open source, a very, very exciting week, one of these weeks where I started by being very busy at work. I'm gonna tell you some of the stuff that I did at work that relate to Thursday as well, and I was playing with the native image mode of GPT, like all of us, probably like the 130 million people that used it in the past week and I was playing with LGBT. I was doing some stuff. I was learning. I wasn't like really, obviously I was paying attention to everything that happens. And then yesterday when I prepared my notes for the show, I looked at a mountain of updates that we need to go through, just like an absolute mountain, a bunch of open source stuff, a bunch of big companies stuff, open the eye is about to be open again, folks, we're gonna cover this as well, but apparently open the eye is about to open source, something soon and you can even sign up for a thing and they talk about this being like, basically we've been waiting for it. finally, we got some updates for them as well. And a lot to cover this week. but this is why we're here to keep you up to date. For those of you who are new to the show, a brief housekeeping. I don't know what to call this. ThursdAIce is brought to you by weights and biases. We don't have other sponsors. Everything we talk about here, unless we specify in advance or during the thing, nothing is sponsored. we just love the folks we bring on and the people we talk to. And, the motto is, we stay up to date. So you don't have to. And that's been the deal with Thursday. I, for the past two years. Today is show number 99, and so our next show is gonna be a hundred episodes, at least according to Substack, which we have. And if you go on Substack, you find ThursdAI please subscribe and share with your friends. at least according to Substack, today's show, 99, which is crazy to me, that we did a hundred shows, because there were a lot of streams that weren't planned. we did some streams on Friday that didn't count as an episode, but officially on Substack, since we started posting, we're up to a hundred shows. We also crossed, and here I will use some of my audio effects. I just noticed that we've crossed a hundred thousand downloads. At least on Substack. So we crossed more. this is crazy to me that a hundred thousand downloads of our episodes has been around for the last two years. And many people write to me and write to my friends and write to my colleagues and say, Hey, ThursdAIs something we listen to every week. this fills me with a lot of joy as well, but also a lot of commitment to keep this professional for you guys and make sure that this is high signal and not fluff. And this is also a point for me to invite feedback. If you have any type of feedback about how to improve the show as listeners, as community members of ThursdAI I would welcome it. And, thank you for participating in the community with this. How about we start TLDR.

3:58

All right folks, we are at the TLDR. So this is everything we're going to talk about outside of breaking news. Okay? So we'll also have breaking news, but this is everything that we're going to talk about on Thursday. I, so I want to run through this just in case you won't be here for the next two hours or so. so you don't wanna miss any part of this. Everything also will be in the show notes as well. All these links I'm sending them every week in the show notes on ThursdAI do news. So if you're missing any part of that, you shouldn't be worried about, oh, I don't have a link. All the links and everything you'll receive in your inbox if you subscribe. So I recommend you subscribe to this. we'll start the show with open source. There's a bunch of stuff in open source lms. the first big one that we're gonna chat about is open. The AI is about to be open ai. get it. they're about to open source a reasoning model. Apparently Sam Alman posted something like ta-da. We're about to open source something. we assume it's a reasoner. I don't know if they a hundred percent confirm it, but I've seen like multiple folks talk about this, so we'll talk about that. They didn't open source yet, but we'll cover it. they did open source some other stuff though. So OpenAI, also open sourced paper bench. A new benchmark. A new eval that actually goes to show how awesome they are in doing, because they released an eval where they are not the leader. Claude is the leader there. So we're gonna talk about paper bench. Paper bench is super cool. we'll also cover Dream seven B. It's a diffusion model. Do you guys remember the, we talked about diffusion language models. So we're gonna talk about this diffusion model, soon. it's Dream seven B is like the claim state of the art diffusion model. yam was very particularly excited about this, so I'm looking forward to hear from Yam, about this model as well. We also have open Hands, theory two b, a fine tune from Quinn, and for this we'll have a guest. I see I'm hoping I pronounce this correctly. Shia Wang from Open Hands. You guys remember Open Hands? We've talked with. With open hands about, their coding agent. and, now they also released a model and there's like also a cloud offering. So we're gonna chat about this reaching out very soon. then we will also talk about, it's not in my notes, but I have that here. we're gonna talk about, Nomics multimodal embed, which, I have Zach nbo, who's been on the show before. He's gonna join and, chat to us about how, they've trained a state-of-the-art multimodal model for embedding PDFs, which is super cool. Then in the big companies, LLMs and APIs, we obviously have to mention, although these prices, like these tickers don't make sense anymore. OpenAI raised an additional $40 billion at a $300 billion valuation, which is more than Coca-Cola and other big companies, more than Disney, more like a bunch of other stuff. Open the eyes on track to become like a trillion dollar valuation company. and also to cross 1 billion users. So that's insane. they also have this new Monday voice, which we'll talk with Monday. Super cool. I don't know if you guys chatted with the EMO GBT yet, or it's pretty cool. So basically Amazon announced this NOVA Act, which is a computer use API. We're gonna chat about this as well. there's a new benchmark from Math Arena called U-S-A-M-O, so Math Olympiad. And, up until Gemini 2.5, none of the models get more than 5%. Gemini crushed it. And we're gonna chat about what this means with 24%. So Gemini 2.5 on this one benchmark compared to every other big model out there gets 24%. Everybody else gets less than five. we have to try to figure out what's here. and then, also there's a new update from Core Weave and NVIDIA, where they announced like 800 tokens per second for LAMA 400, B on their new ml perf, like five benchmark or something. So we're gonna chat about this as well. That to me is crazy. Just absolutely crazy. 800 tokens per second for LAMA 4 0 5 B in this week's buzz. I now call this week's Buzz MCP coverage as well, because to me, they're combined. We're leaning so hard into MCP and all. I like just keep telling you about this. I have an update, I have a request for you as well. I have a whole story to tell you about MCP in this week's buzz. Something that I'm leading personally, inmates and biases and I'm very excited about and really want to share with you. and also, yeah, so this week's buzz in this week is worth, waiting for. We're probably gonna do it an hour in, and I will talk to you about MCP observability, which is a thing that I think that I needs to exist. And, I hope to convince you that it's a thing that needs to exist as well. Vision and video category has been crazy this week because runway started this week with announcing Gen four and Runway has been in the video generation game since Gen one, gen two. The Gen three, gen four looks absolutely ridiculous and will show some example videos. it just looks so, so good and I've seen a lot of, creators already play with this model. Do you guys remember Omni Human from Ance? on the show, you may remember so only human is this like avatar creating. Model from Biden's that they announced but didn't release. Finally, it's available. It's available via the dream in of website and we're gonna chat about that. But not only that, there's another contender that we haven't talked about called Hetero Labs that I've used and, recently many people use as well because they want to animate their image generation from Chad g pt. So I'm gonna show you both Omni Human and hetero labs and show you kinda the comparison that I think, is worth looking at. Now, basically, both of them receive, an image and some audio, and then they generate the avatar. So it's not like a lip sync where you take a video. And then, we also want to, in this. Era as well. We wanna chat about Mocha from Meta, which is a movie, great talking character synthesis. And this absolutely looks just insane, just absolutely some of them cross down Candy Valley for me very easily. So we're definitely gonna show this, folks who are only listening, you'll get from the context when you need to tune into the video. So don't worry, you can go about your day and just listen. once we get to the video though, this specific segment, vision and video is gonna be very visual and very obvious that it's worth watching. So definitely worth, watching as well. If you are listening to the podcast, by the way, I have now recently started embedding the live show that we have. So the thing that people are watching right now, hey folks, at the top of the newsletter. So if you are listening to the show and you actually wanna see some parts of it, the YouTube show is going to be, which chapter. So you can skip to the exact chapter that you want, and that link is on the top of the substack for you, to find super quick. Voice and audio. I have two updates here. Gladia, our friends from Gladia French, company that does transcription models that always try to beat Whisper. They just, decided, that they announced that they released solaria, which is, they call it best in class AI transcription model and translation as well. it's super, super fast. we're gonna chat about this as well, and Helio Minimax audio speech too. That's a mouthful. Helio AI is a company that has a video model and honestly, they have a bunch of other models. Their audio, though I've talked to you about their audio me is the best by far. People still think it's 11 labs because 11 Labs has the better brand, but you guys know that I bring you the nitty gritty of ai and Hallelu is absolutely much better as far as I'm concerned. In audio, they have emotional speech, they have voice cloning that is crazy good. And they have long context as well. So long text, you can send up to 30,000 characters in there. you can generate like almost a whole podcast in there. The voices are crazy. The cloning is crazy. The emotional control they have is unparalleled. not open source at all, but, speech O2 from, minimax is definitely great. AI art and diffusion. We should just mention that, the whole world became AI artists because more than, 130 million people have used and generated over 700 million images with Chet since last week. we're coming up on a billion images generated with Chet since last week, and especially in India as well. it looks like India is onboarding super hard on TGBT. I don't have any other news in the AI art. we'll see if something comes up, but folks who are listening from the AI art community would love to know from you if there's any news that I missed, because I know that this field is very hot and oh, yeah, I see already. So potentially we'll see like a new Midjourney, today. Thank you, Colleen. there's been talk about, about new mid journey as well. Then in the tool section, in the last section that we're gonna talk about, I wanna shout out our friends from Windsurf. If you guys remember, we had Kevin here from Windsurf talk about, cascade and why Windsurf agents et cetera is better. I use Windsurf since then. I really like it. they, added a new bunch of releases including deployment. So now from Windsor you can actually deploy your stuff, which is super, super cool and they're gunning for the bolt and the lovable type of products that we also covered. And then also notebook, LM adds discoverability of sources. So in notebook, LM the thing that generates the podcast from Google. Yeah, you guys remember, you no longer have to bring your own sources, you bring some sources and it discovers other sources for you, which is pretty cool. So folks, this is our TLDR. With this, I wanna welcome LDJ to the stage. LDJ. welcome man. It is been the minute since we've seen you. How are you? How are you doing? Let's jump in into the open source stuff. Yep, let's do it. How you been, man? I've been pretty good. How about yourself? very good. All right, I think with this we're gonna start with open source. We're gonna invite a few folks here to the stage. but before, we'll just, do the open source opening. Folks, let's get it.

13:43

Open source ai. Let's get it started, folks, and now we are welcoming multiple folks. So we, you guys know we have L-D-J-L-D-J, cohost, thank you LJ for joining. And now we have two more folks to welcome. we're welcoming Zach Nusbaum, not your first time on the show. You've been here before. Welcome, I think. Not in video form though, so welcome. Great to see you, Zach. Zach, you work at noc, correct? yep, that's correct. Thanks again for having me. Absolutely. It's always a pleasure to have you on. And we also have, Shing Wang from Open all hands. So all hands, the company Open hands, the AI coding agent. You guys also really something super cool this week. So we'd love to chat and hear from you what that's about and what you trained. but I think, just by order of who's been here first and who I invited first, we'll start with the Nomic stuff. And, Zach, I think that you guys have an announcement this week, not the first time that you guys are announcing things on Thursday. I, I would love to hear from you directly what we're talking about and like talk, talk through the details.

Zach Nussbaum 14:36

Yeah.

14:36

Thanks again for having me. always fun to chat about the cool research that we've been up to and, do it Cool. show like that, that you guys have. but yeah, we released, NOC embed multimodal, state of the art, embedding model that allows you to, embed image text, interleaved. previously most multimodal embedding models required, two, encoders. So like clip style. You'd have a separate text encoder, and a separate vision encoder. But a lot of the problems would be that your modalities would get embedded separately. require you to have, two separate, vector stores, one for your image vectors and one for your, text vectors. we took a bunch of, our learnings from, training, high performing text embedding models and applied this to, the multimodal work that, the great folks that led the co poly team and the DSC work. I think Manuel and, I'm forgetting, the DSC leads, Team name, but yeah. we basically, built, on top of great open source work, and we've released, two, really great Apache two license, seven B embedding models that are state of the art. K noc, is the top of the leaderboard for, I'm gonna butcher the name of the leaderboard, but, uh, video, video re or video retrieval or visual retrieval. this one, VV two. Yes, exactly.

Alex Volkov 15:47

Yeah.

Zach Nussbaum 15:47

so yeah, we're top of the leaderboard there with our