ThursdAI · May 15, 2025

📆 ThursdAI - May 15 - Genocidal Grok, ChatGPT 4.1, AM-Thinking, Distributed LLM training & more AI news

Q: Big Company LLMs & APIs: Models, Modes, and Model Zoo Confusion: what should I know?

OpenAI’s GPT-4.1 series—previously API-only—is now available in the ChatGPT interface. Why does this matter? Because the UX of modern LLMs is, frankly, a mess: seven model options in the dropdown, each with its quirks, speed, and context length.

From Weights & Biases, another week with a "rogue" AI, interview with Dillon from Nous about decentralized runs, a new chinese open source competitor and more AI video!

By Alex Volkov

89 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of May 15, 2025?

What a wild week, it started super slow, and it still did feel slow as releases are concerned, but the most interesting story was yet another AI gone "rogue" (have you even heard about "kill the boar", if not, Grok will tell you all about it) Otherwise it seemed fairly quiet in AI land this week, besides another Chinese newcomer called AM-thinking 32B that beats DeepSeek and Qwen, and Stability making a small comeback, we focused on distributed LLM training and ChatGPT 4.1 We've had a ton of fun on this episode, this one was being recorded from the Weights & Biases SF Office (I'm here to cover Google IO next week!) Let’s dig in—because what looks like a slow week on the surface was anything but dull under the hood (TL'DR and show notes at the end as always)

📆 ThursdAI - May 15 - Genocidal Grok, ChatGPT 4.1, AM-Thinking, Distributed LLM training & more AI news
Why does XAI Grok talk about White Genocide and "Kill the boar"??
Open Source LLMs: The Decentralization Tsunami
Other Open Source Standouts
Big Company LLMs & APIs: Models, Modes, and Model Zoo Confusion
This Week's Buzz - Everything W\&B!

Episode Summary

In This Episode

🔓 📆 ThursdAI - May 15 - Genocidal Grok, ChatGPT 4.1, AM-Thinking, Distributed LLM training & more AI news
📰 Why does XAI Grok talk about White Genocide and "Kill the boar"??
🔓 Open Source LLMs: The Decentralization Tsunami
🔓 Other Open Source Standouts
🔓 Big Company LLMs & APIs: Models, Modes, and Model Zoo Confusion
⚡ This Week's Buzz - Everything W\&B!
🔓 Vision & Video: Open Source Shines Through the Noise
📰 StepFun Step1X-3D: High-Fidelity 3D Asset Generation
📰 Wrapping Up This "Chill" Week

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Dillon Rolnick

COO · Nous Research

@DillonRolnick

Nisten Tahiraj

Weekly co-host of ThursdAI · AI operator & builder

@nisten

Yam Peleg

Weekly co-host of ThursdAI · AI builder & founder

@Yampeleg

LDJ

Weekly co-host of ThursdAI · Nous Research

@ldjconfirmed

By The Numbers

📆 ThursdAI - May 15 - Genocidal Grok, ChatGPT 4.1, A

32B

Hey yall, this is Alex 👋 What a wild week, it started super slow, and it still did feel slow as releases are concerned, but the most interesting story was yet another AI gone "rogue" (have you even heard about "kill the boar", if not, Grok will tell you all about it) Otherwise it seemed fairly quiet in AI land this week, besides another Chinese newcomer called AM-thinking 32B that beats DeepSeek and Qwen, and Stability making a small comeback, we focused on distributed LLM training and ChatGPT 4.1 We've had a ton of fun on this episode, this one was being recorded from the Weights & Biases SF Office (I'm here to cover Google IO next week!) Let’s dig in—because what looks like a slow week on the surface was anything but dull under the hood (TL'DR and show notes at the end as always)

Open Source LLMs: The Decentralization Tsunami

32B

Open Source LLMs: The Decentralization Tsunami

85.3%

[AM-Thinking v1]( (paper [here]( hits 85.3% on AIME 2024, 70.3% on LiveCodeBench v5, and 92.5% on Arena-Hard.

Open Source LLMs: The Decentralization Tsunami

It even runs at 25 tokens/sec on a single 80GB GPU with INT4 quantization.

Open Source LLMs: The Decentralization Tsunami

128

And yes, they’re already working on a multilingual RLHF pass and 128k context window.

🔓 📆 ThursdAI - May 15 - Genocidal Grok, ChatGPT 4.1, AM-Thinking, Distributed LLM training & more AI news

We've had a ton of fun on this episode, this one was being recorded from the Weights & Biases SF Office (I'm here to cover Google IO next week!)
Let’s dig in—because what looks like a slow week on the surface was anything but dull under the hood (TL'DR and show notes at the end as always)

📰 Why does XAI Grok talk about White Genocide and "Kill the boar"??

Just after we're getting over the chatGPT glazing incident (TK: add coverage link), folks started noticing that @grok - XAI's frontier LLM that is also responding to X replies, started talking about White Genocide in South Africa and something called "Kill the boer" with no reference to any of these things in the question!

Adding fuel to the fire, are Uncle Elon's recent tweets that are related to South Africa, and this specific change seems to be related to those views at least partly.
Remember also, Grok was meant as "maximally truth seeking" AI!
I really hope this transparency continues!

🔓 Open Source LLMs: The Decentralization Tsunami

Open source starts with the kind of progress that would have been unthinkable 18 months ago: a 32B dense LLM, openly released, that takes on the big mixture-of-experts models and comes out on top for math and code. [AM-Thinking v1]( (paper [here]( hits 85.3% on AIME 2024, 70.3% on LiveCodeBench v5, and 92.5% on Arena-Hard.

[AM-Thinking v1]( (paper [here]( hits 85.3% on AIME 2024, 70.3% on LiveCodeBench v5, and 92.5% on Arena-Hard.
It even runs at 25 tokens/sec on a single 80GB GPU with INT4 quantization.
The model supports a /think reasoning toggle (chain-of-thought on demand), comes with a permissive license, and is fully tooled for vLLM, LM Studio, and Ollama.

🔓 Other Open Source Standouts

The Falcon-Edge project, which slashes memory and compute requirements and enables inference on <1GB VRAM. If you’re looking to fine-tune, you get pre-quantized checkpoints and a clear path to 1-bit LLMs. [StepFun’s 3D pipeline]( is a two-stage system that creates watertight geometry and then view-consistent textures, trained on 2M curated meshes.

The Falcon-Edge project, which slashes memory and compute requirements and enables inference on <1GB VRAM.
If you’re looking to fine-tune, you get pre-quantized checkpoints and a clear path to 1-bit LLMs.
[StepFun’s 3D pipeline]( is a two-stage system that creates watertight geometry and then view-consistent textures, trained on 2M curated meshes.

🔓 Big Company LLMs & APIs: Models, Modes, and Model Zoo Confusion

OpenAI’s GPT-4.1 series—previously API-only—is now available in the ChatGPT interface. Why does this matter? Because the UX of modern LLMs is, frankly, a mess: seven model options in the dropdown, each with its quirks, speed, and context length.

OpenAI’s GPT-4.1 series—previously API-only—is now available in the ChatGPT interface.
Because the UX of modern LLMs is, frankly, a mess: seven model options in the dropdown, each with its quirks, speed, and context length.
Most casual users don’t even know the dropdown exists.

⚡ This Week's Buzz - Everything W\&B!

It's a busy time here at Weights & Biases, and I'm super excited about a couple of upcoming events where you can connect with us and the broader AI community. Fully Connected: Our very own 2-day conference is happening June 18-19 in San Francisco!

It's a busy time here at Weights & Biases, and I'm super excited about a couple of upcoming events where you can connect with us and the broader AI community.
Fully Connected: Our very own 2-day conference is happening June 18-19 in San Francisco!
It's going to be packed with insights on building and scaling AI.

🔓 Vision & Video: Open Source Shines Through the Noise

We had a bit of a meta-discussion on the show about "video model fatigue" – with so many incremental updates, it can be hard to keep track or see the big leaps. However, when a release like Alibaba's Wan 2.1 comes along, it definitely cuts through.

We had a bit of a meta-discussion on the show about "video model fatigue" – with so many incremental updates, it can be hard to keep track or see the big leaps.
However, when a release like Alibaba's Wan 2.1 comes along, it definitely cuts through.
Alibaba, the team behind the excellent Qwen LLMs, released Wan 2.1, a full stack of open-source text-to-video foundation models.

📰 StepFun Step1X-3D: High-Fidelity 3D Asset Generation

StepFun released Step1X-3D, an open two-stage framework for generating textured 3D assets. It first synthesizes geometry and then generates view-consistent textures. They've also released a curated dataset of 800K assets.

StepFun released Step1X-3D, an open two-stage framework for generating textured 3D assets.
It first synthesizes geometry and then generates view-consistent textures.
They've also released a curated dataset of 800K assets.

📰 Wrapping Up This "Chill" Week

So, there you have it – another "chill" week in the world of AI! From Grok's controversial escapades to the inspiring decentralized training efforts and mind-bending algorithmic discoveries, it's clear the pace isn't slowing down. Next week is going to be absolutely insane.

So, there you have it – another "chill" week in the world of AI!
From Grok's controversial escapades to the inspiring decentralized training efforts and mind-bending algorithmic discoveries, it's clear the pace isn't slowing down.
Next week is going to be absolutely insane.

TL;DR and show notes

Fully Connected - Weights & Biases premier conference - register HERE with coupon WBTHURSAI
AI Engineer - THANKSTHURSDAI 30% off coupon - register HERE
Hosts and Guests
- Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
- Co Hosts - @yampeleg @nisten @ldjconfirmed)
- Guest - Dillon Rolnick - COO Nous Research (@dillonRolnick)
Open Source LLMs
- AM-Thinking v1: 32B dense reasoning model ( HF, Paper, Page )
- Falcon-Edge: ternary BitNet LLMs for edge deployment( Blog, HF-1B, HF-3B )
- Nous Research Psyche: decentralized cooperative-training network from Nous Research ( Website, GitHub, Tweet, Dashboard )
- INTELLECT-2: globally decentralized RL training of a 32B reasoning model ( Blog, Tech report, HF weights, PRIME-RL code )
  - Our coverage of Intellect-1 back in Dec (https://sub.thursdai.news/p/thursdai-dec-4-openai-o1-and-o1-pro)
- HealthBench: OpenAI’s physician-crafted benchmark for AI in healthcare ( Blog, Paper, Code )
Big CO LLMs + APIs
- OpenAI adds GPT 4.1 models in chatGPT
- AlphaEvolve: Gemini-powered coding agent for algorithm discovery ( Blog )
- Google shutting off free Gemini 2.5 Pro API due to "demand" ahead of IO
- ByteDance - Seed-1.5-VL-thinking 20B (Paper)
- Anthropic Web Search API: real-time retrieval for Claude models ( Blog )
- What's up with Grok?
Vision & Video
- Wan 2.1: open-source diffusion-transformer video suite
  ( HF, GitHub, Tweet )
- LTX distilled - near real time video (X)
Voice & Audio
- Haulio - MiniMax Speech tech report is out - best TTS out there (Paper)
- Stability AI - Stable Audio Open Small 341M: on-device text-to-audio (X, Blog, Paper, HF )
AI Art & Diffusion & 3D
- StepFun Step1x-3D - Towards High-Fidelity and Controllable
  Generation of Textured 3D Assets (HF, Demo, Dataset, report)
Tools & Others notable AI things mentioned on the pod
- The robots are dancing! (X)

Alex Volkov 0:31

Welcome, everyone, to ThursdAI, May 15th, my name is Alex

0:37

Volkov, I'm an AI evangelist with Weights Biases from CoreWeave, uh, and today is ThursdAI, ThursdAI, ThursdAI, Officially, the first half of May is behind us, uh, which is crazy to me. Like the, the, the speed with which the things are advancing is crazy. Uh, for those of you who are joining us on the different streaming video platforms, welcome, and we have folks all over the place in Twitter spaces, et cetera. Um, and. We have a very interesting show today, which Yeah, there's there's a few things that I'm very very excited to talk about, specifically things that started like popping off yesterday I will be very clear that opinions on this podcast are my own. They're not represented representative of the parent company of Weights Biases, CoreWeave, as CoreWeave is now a public company. They just had our first podcast. Earnings call yesterday. Opinions on this podcast are entirely ThursdAI related and Alex Volkov not representative of the company. With that said, um, oh, I have LDJ here. Welcome LDJ. With that said, the, the most The next interesting thing that I want to just start with, uh, is going to be Grok's weird behavior, and we're gonna definitely cover this, and also, I promised some folks, uh, to explain the reason why I'm wearing sunglasses, and Very simply, I just flew into San Francisco, so I'm recording this live from the Wit Biases San Francisco office, which is great by the way. Uh, and if you haven't had the chance to join us on any our hackathons, you are more than welcome to tune in and then come to one of them. Uh, there're supposed to be a hackathon. Uh, there was supposed to be a hackathon to tomorrow. Oh, sorry. In, in a few days. Uh, but then we moved it, and now I'm still here. So there's the reason why I'm joining San Francisco. With that said, LDJ, welcome. How are you, man? How was your AI week?

LDJ 2:39

It was good.

2:41

Uh, there's, as usual, just like a lot of little things, and uh, I don't think there's any particular major release on my mind, but um, Hey, what's up, Nisten? What about you, I guess, now that you're here? Uh, any particular release that you're interested to talk about, or just like some little things that, you know, What do you about?

Nisten 3:00

Uh, the new VideoGen model?

LDJ 3:04

Oh, when?

Nisten 3:05

Yeah.

Alex Volkov 3:06

Yeah, there's a few.

3:08

I actually wanted to ask both of you guys, but also folks in the audience. Uh, we keep kinda updating you about video models. Um, I started to get the feeling, and I think if I started to get the feeling, maybe the folks in the audience will also have some of it as well. Uh, the incremental updates between kinda video models, um, are getting to the point of, there's not a lot of I don't know, interest, interesting things there. It's really hard to like show the difference between them, right? Because like video models, they all kind of look great right now. Not all of them, but you know, instruction following, etc. So, you know, sometimes there's a big breakthrough like a runway, but, and sometimes there's like a very, very good model that get it, is getting open sourced, uh, like one, uh, from Alibaba this week. But otherwise, like incremental updates between video models seem to me like, I don't know, like, folks don't necessarily care about them that much, so LDJ, Nisten, and we've added Yam Peleg. Welcome, Yam. Um, what do you guys think? Should we, should we keep, you know, should we cover, should we only focus on, like, the major things in video models? Um, where, where do you guys sit in this? Is, is those updates interest you? And folks in the audience, please comment as well. Would love to know if, like, And update about an incremental update, like kind of like we do with OpenAI, would be interested in the video model world.

LDJ 4:26

Yeah, I kind of agree with what you just said, Alex.

4:28

Um, I, I personally don't find it as interesting as a lot of the other news and like, don't think it's as newsworthy, but then. Then again, like, I think like maybe there is people that do find it newsworthy or maybe some of you actually find it newsworthy. So, uh, you know, like, uh, I don't bring that up, but, um, I guess like, if we do end up all agreeing that like, it's not that newsworthy, then maybe we should kind of tone it down a bit. I don't know, Nisten, Yam. Oh, sorry. Yeah. Go ahead. Yeah.

Alex Volkov 4:57

Go ahead.

4:57

Go ahead. Go ahead. Nisten, what's your thoughts? You, you, you brought one X or one, I guess, uh, up. What do you think about this? It's more of

Nisten 5:06

as to what you can do with it.

5:08

And that was also the only one I started, I started using, but it where the 0.1, what was the company's name? Synesthesia. Uh, synesthesia. Mm-hmm

. Alex Volkov 5:19

That was

Nisten 5:19

doing it.

5:20

We gotta the point out where, I don't know how, how big that, that company is. It? It doesn't matter. It's probably You mean the AI avatar

Alex Volkov 5:27

creator?

5:28

Yeah, yeah, yeah. One image. Oh. So we have. Uh, Synesthesia, we have Hydrolabs and we have Haygen. Those are the three main ones. And then

Nisten 5:36

yeah, it's just that now you got pretty much like close to that level of

5:41

quality and you can run it at home and you can make a, uh, I actually started making it for another non open source product. It was like a finance product, just like an onboarding UI and stuff. And the wine models are actually pretty good. And you don't even have to, uh, like you can run it all in house and you can have it generate this avatar. So it's like you got this, like what was over a billion dollars worth of VC stuff. And now you can just get it for free on HuggingFace and push it on your machine. Uh, whereas before that was like an entire setup. So yeah, that was pretty interesting to me because it just made that, that level of tech pretty available, especially for the 14b model, because it makes stuff. Now that's this product worth it. Like you can just make yourself move around the screen. And then point and stuff. Unfortunately, I don't have a demo I can, uh, I can show.

Alex Volkov 6:39

We have time until we get to the video section.

6:41

We have time to pull up demos. Uh, yeah, so I found

Nisten 6:44

that interesting.

Alex Volkov 6:45

Yeah, I think I have a link in the show notes for

6:48

you as well to kind of pull up. Yam Peleg, welcome to the show. Uh, folks, now that we're all here and we have some folks in the audience as well, uh, what do you think about the, um, The white genocide in South Africa. What are your thoughts on this? Nobody mentioned this so far. What's going on? How come nobody brought it up? I'm of course referring to the fact that Grok from XAI on X has been going off the rails And I think that this is the main topic that I want to cover. We'll start with the TLDR and then we'll dive into this Because if you guys remember a couple of weeks ago, we were here. A couple of weeks ago? A week ago? I don't remember. I think it was a couple of weeks ago. We covered The glazing gate from OpenAI, where OpenAI released a rogue update and people started noticing online that ChaiGPT is glazing. Uh, don't look up glazing on, on OpenDictionary, I do not recommend it. Um, not safe for work. And ChaiGPT then basically was sycophantic as hell and OpenAI released like a bunch of transparency. And now a second Main foundational lab AI is going off rails in a completely different way. Uh, and uh, yeah, we should at least chat about the South Africa gate, I guess. I called it something else, um, Have you guys seen this? Have you talked to Grog? Did Grog try to convince you that the white genocide in South Africa is real already, or no?

LDJ 8:15

I, I didn't, yeah, go ahead.

8:17

Go ahead.

Nisten 8:19

Look, I, as much as I criticize Uncle Elon publicly and people can

8:25

check my tweets, how, uh, what my like opinions and bullshit politics are on that, I have to say grok is a very good. Source of truth as a, as a model, or at least at least has been,

Alex Volkov 8:40

was at least was Yes.

Nisten 8:41

Yeah.