Episode Summary

NVIDIA dominated CES 2026 with the Vera Rubin platform — delivering 5x inference over Blackwell and 75% fewer GPUs for trillion-parameter training — while XAI raised $20B at a $230B valuation amid Grok's bikini-gate scandal. Ryan Carson broke down the Ralph Wiggum autonomous coding technique (1.2M views on X) that lets agents ship features while you sleep, marking the death of "vibe coding." The panel also covered Upstage's Solar Open 100B, Liquid AI's on-device LFM 2.5, NVIDIA's Nemotron Speech ASR with 24ms latency (demoed by Kwindla Hultman Kramer of Daily.co), and OpenAI's GPT Health launch alongside the first US pilot for AI-prescribed medication.

Hosts & Guests

Alex Volkov
Alex Volkov
Host · W&B / CoreWeave
@altryne
Kwindla Hultman Kramer
Kwindla Hultman Kramer
Daily.co — Co-Founder & CEO
@kwindla
Ryan Carson
Ryan Carson
AI educator & founder
@ryancarson
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator
@WolframRvnwlf
Nisten Tahiraj
Nisten Tahiraj
AI operator & builder
@nisten
LDJ
LDJ
Nous Research
@ldjconfirmed

By The Numbers

Vera Rubin vs Blackwell
5x
NVIDIA's next-gen platform delivers 5x inference performance over Blackwell, announced at CES 2026
Fewer GPUs needed
75%
Vera Rubin requires 75% fewer GPUs for 10 trillion parameter MoE training
XAI Series E
$20B
XAI raises $20B at $230B valuation with NVIDIA and Cisco as strategic investors
Solar Open params
102B
Upstage's Solar Open 100B — 102B total parameters, only 12B active per token, trained on 19.7T tokens
Ralph article views
1.2M
Ryan Carson's Ralph Wiggum article on X — autonomous coding technique using atomic user stories
Nemotron Speech latency
24ms
NVIDIA Nemotron Speech ASR — 600M parameter streaming model, 900 concurrent streams on single H100

🔥 Breaking During The Show

Google Gmail Enters the Gemini Era
Breaking during the TLDR segment: Google integrates Gemini 3 into Gmail for 3 billion users with AI Overviews, smart replies, and natural language inbox search.

📰 TL;DR - This Week's AI News Rundown

Alex runs through the week's biggest stories: NVIDIA's Vera Rubin at CES delivering 5x over Blackwell, XAI's $20B raise amid Grok controversy, Solar Open 100B and other open source releases, OpenAI's GPT Health waitlist, Google bringing Gmail into the Gemini era, and the first US pilot for AI-prescribed medication renewals.

  • NVIDIA Vera Rubin: 5x inference over Blackwell at CES 2026
  • XAI raises $20B at $230B valuation
  • Google Gmail enters the Gemini era for 3B users (breaking news)
  • Doctronic: first US pilot for AI prescription renewals

🔓 Open Source: Solar Open 100B

Upstage releases Solar Open 100B, a 102B parameter MoE model with only 12B active parameters, trained on 19.7 trillion tokens with an innovative data factory approach. LDJ highlights the SNAP PO reinforcement learning technique with a 50% training speedup, and the panel discusses how this model outperforms GLM 4.5 Air on many benchmarks with strong Korean language optimization.

  • 102B params, 12B active, 129 experts with top-8 activation
  • 19.7T training tokens with 4.5T synthetic data
  • SNAP PO: 50% RL training speedup
  • Best-in-class Korean language performance
Nisten Tahiraj
Nisten Tahiraj
"These days, datasets for video or text end up being like in the 20 to 40 terabytes range. There is something to be said about what is synthetic and what is not. This gets very tricky because all the data does have a human source at the end of the day."

🔓 Miro Thinker 1.5

MiroMind AI releases Miro Thinker 1.5, a 30B parameter open source search agent achieving 56.1% on BrowserComp — outperforming trillion-parameter models through 'interactive scaling.' The panel debates the growing importance of agent harnesses in 2026, with Ryan noting that domain-specific harnesses are the bleeding edge and Nisten emphasizing how hard they are to build well.

  • 30B model beating trillion-parameter models on search benchmarks
  • Interactive scaling: third dimension of scaling beyond params and context
  • 56.1% BrowserComp, 66.8% BrowserComp Chinese
  • Fine-tune of Qwen 3 Thinking with 147K open training samples
Ryan Carson
Ryan Carson
"The models are so good now, that people might think, oh, you could just open Chat GPT or open Claude and chat, and you can't, we're still a long way from each model being specifically useful for a specific task."
Nisten Tahiraj
Nisten Tahiraj
"I just wanted to say it's very hard to make a good harness. It seems easy at first, but it's just like making a tool or a drill or something. It has to be basically perfect."

🔓 Liquid AI LFM 2.5

Liquid AI releases LFM 2.5, a family of tiny ~1B parameter on-device models with text, vision, and audio support, announced at CES alongside AMD's Lisa Su. The models achieve 239 tokens/sec on AMD CPU and 100 tokens/sec on iPhone 16 Pro Max. LDJ highlights the revolutionary end-to-end audio model that skips the traditional ASR-LLM-TTS pipeline entirely.

  • 1.2B params running at 239 tps on AMD CPU, 100 tps on iPhone
  • End-to-end audio model: no separate ASR or TTS needed
  • 14% on IF-Eval 2025 — impressive for a 1B model
  • Announced with AMD on stage at CES
Nisten Tahiraj
Nisten Tahiraj
"That does make it for them, like the best sub two B model right now, the best on device model."
LDJ
LDJ
"What's really impressive about this too is since it's only 1.5 billion parameters, that means you can run it while having very little ram on your device. Most people have eight gigabytes of ram and it'll be able to run on just that amount."

🔓 Zhipu AI IPO & NousCoder

Zhipu AI (makers of GLM) becomes the world's first major LLM company to IPO on the Hong Kong Stock Exchange, raising $558M. Nous Research releases NousCoder 14B, an open source competitive programming model that achieved a 7% jump on LiveCodeBench accuracy in just four days of RL training on 48 NVIDIA B200 GPUs.

  • Zhipu AI IPO: $558M raised, first major LLM company to go public
  • NousCoder 14B: 7% LiveCodeBench jump in 4 days of RL
  • 24,000 verifiable problems used for RL training
  • Full Apache 2 license with training code and benchmark harness

🏢 NVIDIA CES & Vera Rubin Platform

Jensen Huang unveils the Vera Rubin platform at CES 2026 — NVIDIA's next-gen AI computer delivering 5x inference over Blackwell with only marginally more power draw. LDJ walks through the specs: over 3x the PFLOPS of Blackwell at 1800W, 13 TB/s bandwidth, and 75% fewer GPUs needed for 10T parameter MoE training. Ryan calls it truly astonishing and Nisten marvels at the power efficiency.

  • Vera Rubin: 50 PFLOPS inference, 5x over Blackwell
  • 3x+ PFLOPS gain while only adding ~200W power
  • 75% fewer GPUs for 10T parameter MoE training
  • 72 GPUs per rack, 20.7 TB memory, 100% liquid cooled
  • Announced in full production just 4 months after B300
Nisten Tahiraj
Nisten Tahiraj
"It's three times faster while only adding another 200 watts."
Ryan Carson
Ryan Carson
"I just wanna say, as someone that spent a bit of time at Intel and had a good time there, just how mind blowing this stuff is, what Nvidia is doing, is truly astonishing."
LDJ
LDJ
"Keep in mind the B 300 was only announced to be in full production just four months ago and January 6th at CES Jensen announced that Vera Rubin is now in full production."

💰 NVIDIA Groq Acquisition

NVIDIA enters an exclusive licensing deal and acquires most of Groq's team for approximately $20B. Alex explains how Groq's inference-optimized chips, created by former Google TPU lead Jonathan Ross, complement NVIDIA's training dominance — reinforcing the panel's view that there's no AI bubble given insatiable demand for inference.

  • NVIDIA acquires Groq team and technology for ~$20B
  • Groq founder Jonathan Ross was instrumental in creating Google TPUs
  • Inference demand growing exponentially across all AI use cases

🔊 Nemotron Speech ASR

NVIDIA releases Nemotron Speech ASR, a 600M parameter open source streaming speech model with 24ms median latency and support for 900 concurrent streams on a single H100. Alex plays a demo featuring Kwindla Hultman Kramer of Daily.co showing sub-500ms voice-to-voice latency with a three-model pipeline of Nemotron ASR, Nemotron Nano LLM, and Magpie TTS.

  • 600M params — runs on a toaster
  • 24ms median latency, 900 concurrent streams per H100
  • Sub-500ms total voice-to-voice latency
  • Demoed by Kwindla Hultman Kramer of Daily.co / PipeCat
Alex Volkov
Alex Volkov
"Kwindla Kramer from Daily and PipeCat is the guy who Nemotron showed off on stage, shout out to Kwindla, a friend of the pod, basically the expert in everything voice AI."

🤖 Alpha Mayo Self-Driving

LDJ highlights NVIDIA's Alpha Mayo, a family of open source reasoning-based self-driving AI models announced at CES. The model performs end-to-end autonomous driving with explicit reasoning steps like identifying jaywalkers. Alex jokes about whether you want reasoning in a model that needs to make split-second driving decisions.

  • Open source self-driving model with reasoning steps
  • End-to-end autonomous drive demo in Mercedes-Benz
  • Real-time reasoning: identifies jaywalkers, stops accordingly
Alex Volkov
Alex Volkov
"I don't know if we want the reasoning in my model that drives, decisions to be made fast."

🏢 Grok & XAI: $20B Raise Amid Bikini-Gate

XAI raises $20B at a $230B valuation with NVIDIA as a strategic investor, while Grok faces major backlash over its image model's lack of NSFW guardrails. The panel debates the responsibility of AI products vs tools — Nisten notes guardrails are trivially easy to implement, Wolfram argues for going after bad actors not tools, and Alex draws a sharp line between open-source tools and consumer products embedded in social media.

  • XAI Series E: $20B raised at $230B valuation
  • Grok bikini-gate: no guardrails on image model in replies
  • XAI claimed 600M active users by counting all X users
  • Panel debates tool vs product responsibility for AI safety
Nisten Tahiraj
Nisten Tahiraj
"It's not even that hard to put the guardrails, like you just put like a two B VL model and say, hey, is there a minor in this picture?"
Alex Volkov
Alex Volkov
"There's an absolutely incredible difference. One is a tool, the other one is a product and basically an amplification product that shows this to many people. So there's a big difference and guardrails are important on that product."

🛠️ Alexa Plus on the Web

Alex demos Alexa Plus, Amazon's smart Alexa experience now available as a web chat interface for $20/month. The upgraded assistant supports free-flowing conversations without repeating the wake word, integrates with smart home devices, and can continue conversations across devices. LDJ notes Amazon's earlier Claude partnership and their own Nova model line.

  • Web-based chat interface for Alexa Plus
  • Smart home integration with natural language commands
  • $20/month, text chat only — voice coming later
  • Continue conversations across devices

🏢 GPT Health & AI Medicine

OpenAI launches a GPT Health waitlist for privacy-first health conversations with connected health records and fitness apps. Nisten explains why LLMs are so good at medicine — only ~2,000 diseases and drugs to master. Ryan asks about Epic/MyChart integration, and the panel discusses Doctronic's first US pilot in Utah where AI can autonomously renew prescriptions at just $4 per renewal.

  • GPT Health: integrates Apple Health, Function Health, MyFitnessPal, Peloton
  • LLMs only need to handle ~2,000 diseases and ~2,000 drugs
  • Doctronic: first US AI prescription renewal pilot in Utah
  • $4 per renewal, 190 routine medications, excludes controlled substances
Nisten Tahiraj
Nisten Tahiraj
"There's only about 2000 something prescription drugs. There's only about 2000 or so total diseases. That's nothing for an LLM. This is the most common misconception that people have."
Ryan Carson
Ryan Carson
"My wife just had an MRI. This is insane. We have to call the hospital and say, can you cut a CD of the images so that we can see it? And we don't even have a CD ROM."

🤖 Ralph Wiggum: The Autonomous Coding Loop

Ryan Carson gives a masterclass on Ralph Wiggum, the autonomous coding technique created by Jeff Huntley that hit 1.2M views on X. The method: write a PRD, break it into atomic user stories with acceptance criteria in JSON, then run a bash loop that tells your CLI agent (Amp, Claude Code, etc.) to pick the next incomplete story, code it, commit, update progress, and loop — shipping features while you sleep. Nisten reveals Ralph's origin story from a San Francisco meetup and how it won a YC hackathon.

  • Write PRD → atomic user stories in JSON → bash loop agent
  • Compound learning: agent writes lessons to agents.md each loop
  • Ryan shipped 5 features in 2 days using Ralph
  • Won YC hackathon by letting Ralph run overnight on Sonnet 4.5
  • Works with any CLI agent: Amp, Claude Code, Cursor CLI, Gemini CLI
Ryan Carson
Ryan Carson
"You would love your agent to build stuff for you while you sleep. Well, how do you actually do that? Models now, especially with Opus four or five agents, are basically able to accomplish a lot of what a junior engineer, even a mid-level engineer could do, with basically no input."
Nisten Tahiraj
Nisten Tahiraj
"The dumber you make it, the better the results are. All you do with the bash script is you just grab the initial instructions, which are just really simple and stupid. Usually they're just four lines."
Ryan Carson
Ryan Carson
"This is how real work happens. We don't ever say the word one shot. No real work is done one shot. All work is done through user stories. I think the whole vibe coding term is starting to die, which I think is important."

📰 Wrap Up & Goodbye

Alex wraps the first show of 2026 with over 1,700 live viewers. The episode spanned NVIDIA CES announcements, Ralph Wiggum autonomous coding, GPT Health and AI medicine, and a strong week of open source releases. Wolfram has officially joined Weights & Biases as an AI evangelist focused on evals, and the team teases agentic skills coverage for next week.

  • 1,700+ live viewers for the first show of 2026
  • Wolfram Ravenwolf officially joins Weights & Biases
  • Agentic skills and MCP coverage teased for next episode
TL;DR links:
  • Hosts & Guests

  • Open Source LLMs

    • Solar Open 100B - Upstage’s 102B MoE model. Trained on 19.7T tokens with a heavy focus on “data factory” synthetic data and high-performance Korean reasoning (X, HF, Tech Report).

    • MiroThinker 1.5 - A 30B parameter search agent that uses “Interactive Scaling” to beat trillion-parameter models on search benchmarks like BrowseComp (X, HF, GitHub).

    • Liquid AI LFM 2.5 - A family of 1B models designed for edge devices. Features a revolutionary end-to-end audio model that skips the ASR-LLM-TTS pipeline (X, HF).

    • NousCoder-14B - competitive coding model from Nous Research that saw a 7% LiveCodeBench accuracy jump in just 4 days of RL (X, WandB Dashboard).

    • Zhipu AI IPO - The makers of GLM became the first major LLM firm to go public on the HKEX, raising $558M (Announcement).

  • Big Co LLMs & APIs

    • NVIDIA Vera Rubin - Jensen Huang’s CES reveal of the next-gen platform. Delivers 5x Blackwell inference performance and 75% fewer GPUs needed for MoE training (Blog).

    • OpenAI ChatGPT Health - A privacy-first vertical for EHR and fitness data integration (Waitlist).

    • Google Gmail Era - Gemini 3 integration into Gmail for 3 billion users, featuring AI Overviews and natural language inbox search (Blog).

    • XAI $20B Raise - Elon’s XAI raises Series E at a $230B valuation, even as Grok faces heat over bikini-gate and safety guardrails (CNN Report).

    • Doctronic - The first US pilot in Utah for autonomous AI prescription renewals without a physician in the loop (Web).

    • Alexa+ Web - Amazon brings the “Smart Alexa” experience to browser-based chat (Announcement).

  • Autonomous Coding & Tools

    • Ralph Wiggum - The agentic loop technique for autonomous coding using small, atomic user stories. Ryan Carson’s breakdown of why this is the death of “vibe coding” (Viral X Article).

    • Catnip by W&B - Chris Van Pelt’s open-source iOS app to run Claude Code anywhere via GitHub Codespaces (App Store, GitHub).

  • Vision & Video

    • LTX-2 - Lightricks open-sources the first truly open audio-video generation model with synchronized output and full training code (GitHub, Replicate Demo).

    • Avatar Forcing - KAIST’s framework for real-time interactive talking heads with ~500ms latency (Arxiv).

    • Qwen Edit 2512 - Optimized by PrunaAI to generate high-res realistic images in under 7 seconds (Replicate).

  • Voice & Audio

    • Nemotron Speech ASR - NVIDIA’s 600M parameter streaming model with sub-100ms stable latency for massive-scale voice agents (HF).

Alex Volkov
Alex Volkov 0:30
What's going on everyone?
0:32
Welcome to ThursdAI for January 8th. This is Alex Volkov putting the headphones on, and saying Welcome to the new year. For the first time since we stopped doing live streaming last year, we did have some episodes on the podcast. If you're listening to the podcast, you haven't missed a beat. But for those of you who I'm very excited to be back with my co-host. Welcome Ryan. Welcome Wolfram. How are you guys doing? Good to be here. Happy New Year. Happy New Year. Happy New Year. everybody's here. Super pumped. Yeah, it's gonna be a great show. It's gonna be a great show. thank you guys for joining. Thank you everybody who tune in everywhere. We're streaming live on X, on YouTube, on LinkedIn. this has been, as always a great week in AI news. And as always, what, what recently happens to me is that, I start following the news and then coming up to wind, I was like, ah, I'm not sure if we have a full show full of news. And then as I start to recap, I am again faced with the fact that there's just so much happening in the world of ai, just so much that even if we want to go dive deep into a rabbit hole and talk about something specific, like I really want to talk about CLOs, agentic skills, which is not only in Claude anymore. we may have just a bunch of news to talk about now with us, One of us became very famous during last week as well. And so we absolutely must, tell you that we're going to talk about Ralph. Ralph Wiggum is the new hot thing on the timelines this week, maybe last week, apparently starting like July, but nobody noticed super quick. So, if you have any idea what I'm talking about, that's great. you know that Ryan, blew up this week, with his article. if you have no idea what Ralph Wickham is, stay tuned with us. Ryan, I'm gonna, treat you as a host for the first segment of the show and then as a guest for the second segment. Yeah. we also have another guest with us. I will point to this other guest right here. Boom. I received my Richie Mini. So, for those of you who have no idea what this robot is, this is the Hugging Face slash lebot Richie Mini, and this little guy has been. Just on stage at CES, with Jensen and Nvidia, and, Hugging Face are doing a collaboration, you can install a bunch of AI stuff on it. Hey, Richie, what's up? say hi to the folks, Richie's waking up and saying hi. and, you can order it either the connected version, or the wireless version, and you can program it and you can build it. You get a kit, you don't get a assemble robot, and you connect it and it's adorable and super cute. And, you can do all kinds of software. You can make it dance, you can connect AI to it. you can connect it to AI agents, I think Wolfram, you did this as well. So folks, say hi to Richie Rischi, say hi. Back to the folks for the new year. Richie is gonna be, a permanent guest on the top right, of my shoulder, kind of like a pirate, and his, parrot. I also have the Weights, & Biases be here. So small upgrades for the new year. as we say, also. Hello. All right. Rischi is gonna be in the background. Probably you, you'll see him kind of like dance There's only place for one the Yapper in this household. but we wanna welcome LDJ as well to the show. folks, happy New Year. How's it going? Ryan, let's start with you because you have excitement happening for the new year. I would love to hear how you feeling being so famous on the internet.
Ryan Carson
Ryan Carson 3:49
So yeah, I post an article on X and, the algorithm was kind to me.
3:53
I think I was just building stuff and using a tool that everybody was talking about and I was like, Hey, I'm gonna write an article about that. And I guess everybody liked it. So 1.1 million views later. Wow.
Alex Volkov
Alex Volkov 4:04
incredible.
4:05
And the article was
Ryan Carson
Ryan Carson 4:06
my friend Jeff Huntley created Ralph and we'll talk more about
4:09
it, but he created Ralph a while ago. and it kind of blew up again. I thought I would try it and, like any good content creator, I was like, oh, I'll create a little tutorial on how to do this. And again, the algorithm Gods were kind to me, so. Here I am. and now I'm like literally shipping. I shipped three features today with Ralph, like concurrently.
Alex Volkov
Alex Volkov 4:29
Is is Ralph running in the background?
4:31
Yeah, like right now. That's awesome. folks who again, are not familiar with Ralph, this is kind of a, technique to do code better Robot. We'll, we'll talk about this. I have no idea. I I do have an idea that at some point the algorithm decided that Ralph is the hottest shit in the world. And so everybody was like just talking about Ralph. throughout the holidays, I think also clot code got really, really hot in the a room. Like, there were days that most of my timeline, not, not the following timeline, the, for you timeline, that graph completely controls was a hundred percent cloud code. it was quite ridiculous. People were finding it out during the holidays. We have a few new folks here. sometimes tuning in Cloud code is the site project in philanthropic that is a command line interface to use cloud for coding and that. Side project became a $1 billion business on its own, I believe, like, at least 1 billion. during the holidays, Boris Journey also joined both Twitter and Threads. You guys remember Threads exists, the, the Twitter of meta. and on both of them, he went super vital saying, Hey, I'm the creator of Flat Code. Ask me anything, whatever. so that was cool. So we're gonna mention some quote called stuff, Wolfram, the holidays, the break and everything. how's the holidays for you? How you doing? What's in your world?
Wolfram Ravenwolf
Wolfram Ravenwolf 5:51
Well, the holidays were great, and now I'm excited.
5:54
This, fifth year has been a special year, as some people may have noticed already. If you look down here, you are not the only AI evangelic words and biases anymore. So I joined the team as we announced in the last show in December. And, yeah, I'm still going through onboarding and getting ready to rock, but my first impression is excellent and I'm really looking forward to what we can achieve together.
Alex Volkov
Alex Volkov 6:17
So this was not a loaded question.
6:19
I did literally ask, Wolfram about, holidays and stuff and the things he built. But yes, Wolfram is officially part of Weights, & Biases, part of the evangelism team here on Weights, & Biases. We'll be focusing on eval. So we're gonna bring you a bunch of that goodness here on the show as well. So I think that this is gonna be super, super exciting. And we have LDJ also over here. LDJ, how are you doing? What's new? What's super exciting for you in the eye world? How'd you celebrate New Year's? Did you build ethical? welcome to the show.
LDJ
LDJ 6:47
Thank you for having me.
6:48
I'm doing great. great holidays, great last few months of last year. And what I'm really excited about and thinking a lot about recently is the new Verein GPUs, which we knew they were coming a while back, but they were expected to not really be announced to be in production until later. And they seemed to pretty much be here early. It was literally just four months ago that B 300 was announced to be in full production. Now, Vera Rubin is announced in full production. Obviously we'll go more into details later, but that's what I'm excited about.
Alex Volkov
Alex Volkov 7:20
Vera Rubin is the new line of,
7:23
Complete computers from Nvidia. They were announced to C server GPUs. Yep. Server GPUs. and I think there's CPUs in there as well. There's a bunch of stuff We're gonna cover all of the, CS Nvidia announcements, very soon. I think this is one of the biggest news. Apparently some, folks are already running on this. So, runway 4.5 is already running on the new Nvidia hotness. and definitely, we've seen some changes in the market based on, updates like this. So we're gonna cover this, earlier. Thanks for propping this up. And I think that it's time for, a quick TLDR. quick. TLDR is where we basically run through everything that we have as far as news for, this week. We were maybe expecting a Chinese Christmas surprise. but I think besides, GLM 4.7, nothing major was released as far as I saw. Qwen edit was released a little bit. We're gonna mention this as well, but generally, fairly quiet holidays, which is great for us 'cause we wanted to be on a holiday. just a quick reminder though, we did release the end of the year recap where we went month by month of every AI release from this year. so if you haven't had a chance to check that out, that's on the podcast. And also, on January 1st, I posted an interview with Will Brown, ML researcher with Prime Intellect. And that episode is great. I recorded that during AI engineer Will is a great dude. It's really, really fun to talk to him, in person. And so I posted that and really hope you have some time to check that out because I think that's a great, interview. and with this, I think time for TLDR. And as always, a reminder, we're a very, breaking news pro. Show. So if something happens yes, breaking news during the show, please tell us in comments. We love to see breaking news. We love to discuss breaking news as they go live. this is how the shore started nearly three years ago. Can you guys believe it? In March? So, if you have any breaking news, definitely send us and, we are going into TLDR. Let's go.
9:28
Welcome to the TLDR for ThursdAI this is the segment where we talk about everything that happened, during this week. Super quick so that you'll be up to date. And if you decide to stay and dive in with us, you are more than welcome for the rest of the show. this week, there's a bunch of stuff happening. so In open source, we have solar open. It's a hundred billion parameter, MOE from upstage, a Korean lab, focusing on data training and strong benchmarks. We also have a newcomer to the open source mirror Thinker, 1.5. It's 300. 30 billion parameter open source search agents that beats trillion parameter models on browser comp and interactive scale via interactive scaling. Our friends from liquid AI released LFM 2.5 Li Liquid Foundation models 2.5, a family of five tiny, around 1 billion parameter on device foundational models with text, vision, audio, and Japanese support. they announced it on CES, I believe together with a MD Lisa Sue, I think on stage. liquid is great and shout out to liquid. They're great friends of ours. Ah, another set of friends, great friends. We've been tracking their work since the beginning of ThursdAI pretty much Nous research released NOUS Coder. It's a 14 billion parameter open source competitive programming model that achieved a 7% jump on life code bench accuracy in four days of RL training. RL stands for Reinforcement Learning, and for the new year, I will remind some folks about some terms that we use often, to be welcoming to new folks to the show. Z-A-I-F-K-A GPU ai, they became the world's first major alarm company to IPO, raising $558 million. the folks who make GLM and they made GLM 4.7, which is probably the top coding, AI in the world of open source right now. This is it, currently that I have for open source. Let's skip to the next big category of news here on ThursdAI, which is big companies and their LMS and APIs. the one thing I would love to talk about on the show is that grok is in trouble again, because everybody started adding bikini to everything, which sounds funny when you do it to Maduro getting captured, or Elon Musk does it to himself. But, when people do this without consent to, women or even minor pictures, minors, this is really, really bad, especially as a product. this happened and gr was in trouble again. We talked about gr being in trouble before as XCI announces the biggest raise. I think they're, they have so far, series E, they raised another $20 billion at a $230 billion valuation, which is bonkers. And there's a lot of GPUs coming to grok. Amazon meanwhile launches Alexa plus on the web, and expanding its AI assistant, beyond the Echo devices. You can go into, alexa amazon.com and actually use the SMART Alexa on the web. I think it's free for Prime members as well. And if you're not a prime member, you can pay for it if you want to. I don't know why you would. TLDR going back on track and VD announces Vera Rubin platform and CES 2026 and generally announced a bunch of other stuff. Vera Rubin, six new chips delivering five x inference performance over black. Well, absolutely bonkers. LDJ would love to chat with you about this. meanwhile, open the eye was fairly quiet this week. The main thing open the eye launched was GPT Health, a privacy first space for personalized health conversations with connected electronic health records and fitness apps, including Apple Fitness. that is a wait list, so get excited, but not too excited. So, OpenAI, launched a wait list and we'll see how fast this wait list will open up. While on the health news, this is a new thing. Doc launches their first US pilot allowing AI to autonomously prescribe medication without physician oversight. We've been waiting for this. it's a pilot. It's somewhere in one state somewhere, but, it's coming folks. It's coming. AI doctors are almost here. And, I will just mention that. Speaking of AI doctors, a person who worked on, I think maybe the first AI doctor in in existence is also with us. Niton, welcome to the show. Happy New Year. We're in the middle of ak, but wanted you to say hi to the folks.
Nisten Tahiraj
Nisten Tahiraj 13:37
Yeah, happy New Year, everybody.
13:39
I have a cold, so I could use that AI doctor right now.
Ryan Carson
Ryan Carson 13:45
I thought that was just sexiness nyon.
Alex Volkov
Alex Volkov 13:47
I thought this is just your new mic that you
13:48
got yourself for Christmas. No,
Nisten Tahiraj
Nisten Tahiraj 13:50
I'm just sick.
Alex Volkov
Alex Volkov 13:51
I was told though, when I'm sick, I sound better on the podcast, man.
13:54
So hopefully get some tea. Stay with us. you have, you have some work to do today, my friend, because of the new GPUs. UN LDJ will, could potentially go deep if you have the power frame. Alright, going back to TLDR super quick. So Doc, first AI pilot and Google. This is new from today, folks. Breaking news. I really want hit that button for the first time this year. AI breaking news coming at you only on ThursdAI The breaking news from today in the TLDR segment folks is that Google brings Gmail into the Gemini era with AI overuse, smart replies, and AI inbox for 3 billion Gmail users. That's right. Gemini three, is powering a bunch of features inside Gmail for the first time. we all use Gmail. It hasn't updated since forever, and I'm very, very excited to test that out. probably not live on the show 'cause I don't want you reading my emails, but, we'll definitely tell you about this, as much as possible. All right. so the next segment of the show, after we discuss all these things, is gonna be vision and video. And so, in video there's two, two things that we, we may discuss. Just one of them. LTX two is finally open source. We told you about LTX two back when it was released back in October. They announce is gonna be open source and it's finally open source. You can run this. It's a first truly open audio and video generation model with full training code for consumer GPUs. You can Finetune LTX two on your stuff and it's like Sora. Level, almost it, it can generate people who talk. And I think for open source, this is the first and the only maybe video model that also generates audio and does it well. So that's great. So we'll mention this if we have time. this is from Kai, the Korean Institute for Technology. it's a framework, real time interactive talking head avatars with 500 millisecond latency and 6.8 speed up on previous versions. Wolfram and I are very excited about those technologies specifically, and this one is real time if they release the code. So we haven't been able to test, but the videos look really, really cool. The thing that I wanted to touch, briefly if we have time, is Nvidia Nvidia launches Nron speech. A-S-R-A-S-R stands for automatic speech recognition, NRON speech. It's kind of a fork of parakeet, or building on top of parakeet. It's a 600 million open source streaming model with 24 millisecond median latency and up to 900 concurrent streams on the H 100. This is sick. The sickest thing about this is that when they presented this, I heard a very familiar voice, a voice that was featured on Thursday. a voice of Quila Kramer, our friend, as daily the company he runs. And, one of the top people in real time AI, voice agents, was collaborating with Nvidia on this. And so it was great to hear a friend, on stage behind Jensen, announcing this special new technology. our guest of the interview is right here, Ryan Carson, who blew up with a new coding technique or a new approach to autonomous coding assistant. Orchestration, let's call it like this, which is called Ralph Wiggum after Ralph Ham in Simpsons. And Ryan, who works on the AMP team, a great, coding tool is going to be with us talking about Ralph Wiggum and tell us about this.
Ryan Carson
Ryan Carson 17:04
And I wanna be clear, Jeff Huntley deserves
17:06
all the credit for creating it. I just wrote an article about it and used it.
Alex Volkov
Alex Volkov 17:10
shout out to Jeff Huntley, whose work is now
17:12
featured, and everybody's all, all they can talk about is Ralph. All right, folks, this was the TL DR, hope I didn't miss anything. I'm excited about Gmail ai. but I think we can get started with open source Nisten. You look like you have something to say.
Nisten Tahiraj
Nisten Tahiraj 17:27
Is my Mixtral phone Okay.
17:29
Just, just checking.
Alex Volkov
Alex Volkov 17:30
yeah, you're coming through.
17:31
Sounds good. Coming through loud and clear. That's awesome. right folks, super quick, any big news that I have missed in the TLDR? And if not, we will move on. Also asking the audience folks who are tuning in, who are saying we're looking forward. somebody said Cloud code 2.1 as well. Milos, so thank you Milos for that. I think I saw cloud code 2.1. I also saw, the cloud code. Stop working for a second. I will say this super quick, and I know this is maybe not my place, and we're not a political show at all, but what happens in Iran is, heartbreaking from one point, but inspiring from the other point. So, shout out to the folks in Iran for what they're going through. Very, very brave, and I'm, really praying for everybody there to stay safe and hoping for some news. I'm saying this as the intern was shut off in Iran. hoping that everybody there is okay. Basically, just had to say this. And let's move on. open source folks. Open source. Our favorite corner in ThursdAI, let's do it.
18:36
Open source ai. Let's get it started. All righty, folks. open source ai and I think, we need to choose the top two things to talk about. I definitely, wanna talk about solar first, and then we can chat about some other stuff as well. solar from upstage is the first release we're gonna mention. Let me just pull up the notes here for you, so that we can show you what everything is about. We talked about upstage a while ago, and this is solar open, but solar I think, has been on the radar for a while Now, let's get, the screen on and then, yeah. so we have upstage releasing solar. It's 102 billion parameter, MOE, with big data training and strong benchmarks. it's very interesting to talk about AI models in the open right now because I remember we used to, chase benchmarks a lot, and we're like, okay, this new model beats that previous model. this happens, but started to happen less frequently. if you guys can agree or not. I would love a discussion here as well. many open source models release and I start to notice a trend of, maybe new models get released, are specifically, better at one thing or two things. rarely do we see a model in the open source that comes and beats everything out there. that's usually now in the realm of big companies. and in that kind of like, area, I would like to mention solar open. It's not a. State-of-the-art model across everything. Definitely not. But there, there's a few very interesting thing about this. so 102 billion parameters with only 12 billion parameters. Actives, per token, 129 experts, 128 routed, one shared, and, top eight expert activation. MOE seemed to be the hot thing across open source. the cool thing about this is the amount of training they did. So this model was trained on almost 20 trillion tokens. 19.7 with 4.5 of them are synthetic. as a brief reminder, I would love to invite LDJ to, like briefly give us a, overview of why that matters and why the size of the training dataset matters. LDJ, you offer a brief refresh for the new year for folks of why, what the difference between tokens and parameters is, and why the size of the thing, matters. And maybe Nisten can help us with this. It looks like LDJ is having some technical issues. Nisten 19 trillion tokens, 19.7 trillion tokens is absolutely insane.
Nisten Tahiraj
Nisten Tahiraj 21:12
I mean, that, sounds about right.
21:16
These days, datasets for video or text end up being like in the 20 to 40 terabytes range. the way they count the tokens is, let's say for example, the vocabulary is, 128,000. Usually the vocabulary is like 120,000 or 150,000, tokens. And then you just do, whatever that is to the power of 16. And then you can actually get the exact amount of data before, it is compressed. But I would say these days that is becoming pretty standard. there is something to be said about what is synthetic and what is not. this gets very tricky because all the data does have a human source, at the end of the day. And, if you use an LLM, they, you, everyone uses an LLM to filter data. But, if you just use LLM to filter the data that's already there, can you say that's synthetic? Or maybe you just changed one or two things, or, but what, for example, you just took, the agent data of a very skilled developer or a DevOps person, and then you built that up as in, An entire agentic, dataset. So now technically that is synthetic, but also you are, using a lot of human data in there as well. So it, it is very hard to actually quantify if something is fully synthetic or not. It's all mixed Now, the main thing you wanna keep in mind is that you have to have a good distribution of data. So if, if you make it all LLM generated, you'll just end up with a lot of patterns which hurt the model. so you, you'll just like instill very, very hard patterns and then the model just keeps doing a lot of dash search saying absolutely right the whole time.
Alex Volkov
Alex Volkov 22:59
I think that's incredible.
23:00
Thank you, Nisten, for the breakdown. LDJ, I think you're back with us. Anything you wanna add, feel free. I'll add specifically about upstage. the biggest contribution here is their innovative data factory approach and difficulty aware training curriculum, which has a dynamic curriculum of difficulty from 10% to 64.5% synthetic ratio, across phases. So they play around with the ratio of how much of the data they're training on right now is synthetic. and like Nisten said, synthetic data is usually data. generated, by other LLMs, but it's really hard to quantify if clear data is synthetic data as well. the highlight, the other highlight there is, they have a specific, particularly notable Korean language optimization, addressing critical gap or global, global open models, underperform non-English languages, that's also ante's big, big area where open source kind of like, is doing some frontier. And so on the small Korean leaderboard, this model looks like it. It, it beats, previous models. Korean a IE is 80% plus on, on, on this model. So upstage Korean lab. Shout out to them. We'll give you some more transparency about them if we can.
LDJ
LDJ 24:12
Yeah.
24:12
to just add on a bit to what NI was saying, I think ultimately we could just call it synthetic data if it's comes out all at the end by the LM at all. 'cause even regular, even just when you're even chatting with Chat GPT, even that information ultimately came from human somewhere along the, the way from when it was originally trained. So there's always some type of real world data that eventually got into the system to result in that. And this class of techniques of synthetic data really is just so many different ways of how you could use, LMS in combination with real world information like that. what's interesting too, though, which I feel like hasn't been mentioned, or maybe when my, my headset went out, maybe it was mentioned, but SNAP po, which is their reinforcement learning technique here. And, in, in short, like you said here in this, in your little summary. That you had up the image, the Gemini flash image. Yeah. I, I think you mentioned 50%, training speed up. And so that's going to be interesting and, and those types of reinforcement learning frameworks are, are going to be helpful because the more efficiently you could learn from information and the more you can improve the performance of the model, if you actually look at the benchmarks on the hugging face compared to GLM 4.5 air, I believe it is. It's significantly above that. And a lot of the benchmarks, and I think for this size of model GLM was what people were using. And this seems to maybe be the new best one.
Alex Volkov
Alex Volkov 25:39
Yep.
25:40
And I think one of the things that we get excited about in the open source contribution is that, when these model release, We often appreciate some of the transparency, right? sometimes we only get the weights, but in this case, we also got like a roadmap of how to use the data, and how to use, different reasoning traces to train this model better they released a bunch of that stuff, including a full report. That's great. I wanna shout out ILI Bko from Hugging Face, who was a guest on the show, and he did like a full technical deep dive. they have a technical report. A deep, deep technical report will link this to the show notes. So if you're interested in that. but generally the, the thing that the highlight from these reports and these releases is, sometimes these labs release their RL techniques, reinforcement learning techniques, the verify, signature. they release a roadmap for how other labs in open source can build. And, then we all benefit. So I think that's great. and, it's commercially viable as well. So you can use this, model, especially if you are, a, if you in need in, some more Korean, better Korean, you can use this model. Alrighty, folks, we're gonna move on. I wanted to chat about Miro, as that, came over my kind of, desk, I think Ham. You also saw Mirror Thinker as well. So let's bring this to the show super quick. I think it's a newcomer. I don't remember us talking about Miro at all. ham, would you like to. chat about Miro.
Wolfram Ravenwolf
Wolfram Ravenwolf 27:02
Yeah, so Miro mind AI has released Miro Thinker 1.5,
27:06
which is an open source search agent.
Alex Volkov
Alex Volkov 27:08
Yeah.
Wolfram Ravenwolf
Wolfram Ravenwolf 27:08
exactly.
27:09
it has an agent density, new paradigm basically that is more important than the pyramid account. So it's a 30 B model, but it is achieving frontier performance on agent search benchmarks like we just saw above, where it, got 56.1% on, browser cap comp and 66.8% on browser com Chinese. So it was outperforming trillion paradigm models like Chemic K two, which only scored 60%. And the secret source is the interactive scaling, which is, a third dimension of scaling where agents form hypothesis, which if evidence we are search tools and detect conflicts to iteratively revise in real time using a time sensitive sandbox to prevents it inside bias. That is, I'm reading from the notes because, yeah, it's cheaper than the bigger models and, yeah, it's a new program for search, so that is one of the instances where fine tuned specific model for a specific case, can do a lot better than a big generic model if it's tuned for its particular situation.
Alex Volkov
Alex Volkov 28:13
Yeah, this is great.
28:14
thank, thank you all from, for, for the coverage. folks, any comments on, on mirror thinker? the interactive scaling, this for me highlights the importance of kind of like harnesses and agent harnesses in in 2026. we definitely know that raw intelligence from the models is very important, emails, et cetera. but Ryan, genetic labs and harnesses, they seem to be more and more important. This could be the year of, of genetic harnesses, because we saw that even with the raw intelligence, you can improve them significantly by, by doing some very clever tricks. If you have any comments on, on the harness part as, as one of the experts, feel free to join us.
Ryan Carson
Ryan Carson 28:53
I mean, the models are so good now, that people might think,
28:56
oh, you could just open Chat GPT or open, you know, Claude and chat, and you can't, we're still a long way from, each model being specifically useful, for a specific task. So, it just happens that coding is a pretty key task in the world. and so the coding harnesses like amp or clog code or Cursor, Devon, all of the stuff, are very important. the coding harnesses are the bleeding edge, right? And then there's a lot of us building businesses, like what I'm building, where I'm building a, a harness for a different industry. It's not coating, but I'm using everything that I've learned from, the coating harnesses. So, watch all these spaces closely.
Alex Volkov
Alex Volkov 29:33
I think the highlight of, harness use and why we would use them
29:36
is because a smaller model with a great harness can beat significantly larger models and thus making it cheaper. And I think that we're all looking for, how we're gonna employ these models. The smaller one, the open source one, the MAT licensed ones, and I think harness is the answer. the few more notable things in, Miro mind is Miro verse, open source training dataset with 147 K samples supporting research agent training, which is great because, many of us use these models for, research. And then we also have, this is a Finetune of Quin three thinking, which is great. comments, folks, or we're gonna
Wolfram Ravenwolf
Wolfram Ravenwolf 30:09
move on to the next one.
30:10
Speaking of that dimension, I think it's interesting that it can handle up to 402 codes per test. So, you have the size of the model, the parameter account, you have the context it can handle, you have, thinking or non-thinking. And now there's also this dimension of how well that is about the harness, how well can it use the harness. And I think it goes both ways. A model trained for a specific harness will probably do much better in that, and at the same time, as better as the harness, the more it can use the model. So we have, two ways to improve the models and the agent that way.
Nisten Tahiraj
Nisten Tahiraj 30:42
I just wanted to say it's very hard to make a good harness.
30:46
It seems easy at first, but it's just like making a tool or a drill or something. it has to be basically perfect. you can't have many things to worry about for the tool that you're gonna use the most. So it does seem kind of, haphazard versus how these tools are put together as command line interfaces. But, to actually make it good is pretty hard. And, things will get more interesting as we get to, continual learning and, the models start to self-train and self adapt later. So there's, it's quite a bit of a ways to go, but yeah, I just wanted to say it is hard. It's very hard to make a good harness, that's all.
Alex Volkov
Alex Volkov 31:24
Harnesses we talked about, super quick run
31:26
through the rest of the updates. you guys select if there's anything interesting that we wanna mention there, as well as, we have a lot of show to cover and, some of these, deserve a deeper dive through. we have other, open source LM announcements here. Let just add them here super quick. So, liquid AI l fm 2.5 LI Liquid Foundation, liquid Foundation models 2.5 family F five tiny on device foundation models with text vision, audio support at CES. let's look at their performance super quick. very impressive performance against, smaller models. So this is a 1.2 billion parameter LFM, 2.5 compared to, compared to LAMA 3.2, which is a long time ago. A model from a long time ago from to Gemma three and granite, 1 billion. So basically all of the 1 billion ish parameter, models on IF bench and if eval, instruction following bench instruction following eval. this small model gets a significant, boost IE 2025. This model gets 14% again for 1 billion parameter model is very impressive. And, 38% of GPKA. So, we know that liquid does incredible stuff. the most important thing in this is, I think, speed. Okay. So Liquid always releases super cool stuff because of their, infrastructure, and specific architecture. They, specifically have 239 tokens per second on an A-M-D-C-P-U. And this, liquid foundation model were announced together with Elisa Sue and a MD on on stage, 82 tokens per second on the Snapdragon Gen four, neural processing unit. So these models are basically built to run on devices, on toasters, on, iPhones, iPhone 16, pro Max was a hundred tokens per second. now again, these models, you will likely be very surprised if not a little bit, Disappointed if you are comparing this to the latest thinking, GPT 5.2 Pro Max, right? So like, do not expect the same level of performance, but for many, many use cases, such as quick translation, summarization, potentially summarization is a great use case for smaller models. and, you know, autocorrect different things, smaller models absolutely rule. And to run them on device, it makes no sense to go and burn tokens on A GPU somewhere for a big model. because you can do so much with smaller models What else? Can we talk about these models? the vision language one. you would be surprised how well the small vision models perform for smaller tasks for saying, identify this. I'll give you a straight up example of why small models are needed. The Richi Mini right here, the little robot that I have, that we presented at the beginning of the show has limited capacity, has a, raspberry pie on there with limited capacity. Bigger models, 30 billion pyramid models cannot run on this. So something like a liquid foundational M 2.5, can potentially run on device. And the vision model means that connected with the eyes that it has. it can detect and run some stuff on very small footprint, which is very important. And, LFM two five is very, memory efficient. So actually now talking about this, I'm getting very excited to try and shove, you know, LFM model into this little guy and see if you can put like a little brain in the thing.
Nisten Tahiraj
Nisten Tahiraj 34:35
that does make it for them, like the best sub to be model
34:40
right now, the best on device model. yeah, I was hoping we'd see more specialization for smaller models for on device stuff, but we're not seeing it quite yet. I think with new hardware and stuff that comes out, we're gonna see more specialization for the small ones, because this is the one thing where it would make sense to. Continue pre-training and or fine tuning it for a task.
Alex Volkov
Alex Volkov 35:04
And I think, fine tuning, let me mention super quick again, new year
35:07
we're gonna mention some of the stuff. Fine tuning is the practice of taking a base model that, the company releases and, aligning this to your task based on your data. So fine tuning, is something we've been following on Thursday, for a while. Fine tuning is also some area that Weights, & Biases is great at. 'cause everything that you fine tuning can track with Weights, & Biases, models, so that's great. But besides this, fine tuning is absolutely crucial for small models. This has been the hype of new IPS last year where base open source models were fine tuned for specific purposes. sometimes, folks who are building harnesses are getting the results of those harnesses fine tuned back into a model. So we saw, cursor with, Service Composer and cursor one, So basically they took open source models and fine tuned them for the purposes based on the harness results, on top models as well. LDJ, you have a comment on LM as well.
LDJ
LDJ 35:56
Yeah, the l fm 2.5 audio, the 1.5 B one.
36:00
So I think that's really impressive 'cause there's a lot of, a lot of new unified audio models I guess you can call it, that are usually able to do audio and language and able to take in audio as just a native input without needing to first process it through a dedicated audio transcription model. Then also able to output audio itself without needing to send it to a dedicated text to speech model. Mm-hmm. And so LFM 2.5 audio, it's able to do both of those things for input and output. And just, if you'd like to, I just sent a short video that kind of gives a little demo of how high quality it sounds.
Alex Volkov
Alex Volkov 36:36
Yes, we should absolutely do this.
36:39
And I believe
LDJ
LDJ 36:40
Maxim Labone,
Alex Volkov
Alex Volkov 36:40
Oh, Maxim is a great friend of the pod, person.
36:43
He works at LFM and he posted, let's see if I can, if I can bring this up here. and then lemme zoom in and see. And you guys let me know if sound comes through your screen.
LDJ
LDJ 36:57
Yeah, sure.
36:58
well, while you're doing that, what's really impressive about this too is since it's only 1.5 billion parameters, which is what the 1.5 B stands for, that means you can run it while having very little ram on your device. So most people have, you know, eight gigabytes of ram or even many phones these days have around eight gigabytes or some, And it'll be able to run on, just that amount.
Alex Volkov
Alex Volkov 37:23
Let's take a look.
AI
AI 37:30
What is this obsession people have with books?
LDJ
LDJ 37:32
So I think here he's just using it as a TTS model, but you could also use it as
37:36
like a model to have a conversation with. And so here. Oh, so he, he just said, can you hear me? Mm-hmm. Or hear, 'cause maybe he's using a different mic there.
AI
AI 37:48
Two plus three equals five.
Alex Volkov
Alex Volkov 37:51
Oh, so interleave mode, I think is very interesting,
37:53
where it's not only, audio. So this model can take text and audio as well. And I think that this is like, interve mode support. this is a video from Maxim Manad and he's like bullish on things. This runs locally on a CPU with Lama cpp, realtime text to speech NASR. usually way bigger models, than this. realtime text to speech NASR. It's kind of the both sides of what you need in an agent. voice conversation, right? You have text to speech that turns into speech. Then in the middle you have an LLM and then, speech generation on the other side. speech to text, and then text to speech. All right, folks. this has been LFM. any other comments? No, we're gonna move on. that looks very good.
Nisten Tahiraj
Nisten Tahiraj 38:31
These small models are used all the time.
38:33
So the speech stuff is, and the speech recognition all the time. But Coco, everything uses, they get embedded in websites too. Alright,
Alex Volkov
Alex Volkov 38:43
So I think, the thing about audio specifically
38:45
is, I have it in my notes. It's a, eight times faster audio token on mobile CPUs without the pipeline of A-S-R-L-M-T-T-S, the pipeline that I mentioned. So this model can, do all of it, which is great. although again, listening to Qwen, our, Qwen La Kramer of Daily and the maintain of Pipe Kat, the three pipeline setup for agen voice, conversations is still the goat because you wanna replace the middle one with a high intelligence. so while this is great, the folks still need to get the most latency with a three a SR automatic speech recognition, l la middle, and then TTS text speech on the other side. all right. Let's move on to, mentioning the last two things. the super quick thing I wanted to shout out, Z AI has gone public and shout out to them, because first of all, I think folks, maybe you agree with me. it's great to see, great open source, companies go public and receive a good reception. I think they went public on the, Hong Kong, stock exchange with a ticker of 0 2 5 1 3. but yeah, ZAI, raised, half a billion dollars, at, 37 million shares. And, great revenue growth as well. Shout out to them. rapid under radar execution. Continuous delivery is the community reaction
Nisten Tahiraj
Nisten Tahiraj 40:00
there.
40:01
This is not financial advice, but if you are on that exchange, do not confuse the other ZAI listing, which is like a medical, med tech company in Hong Kong. That's a completely different one.
Alex Volkov
Alex Volkov 40:14
ZI, of course.
40:15
Thanks. This said ZI of course, is the, maker of the GLM series of open source models, which we absolutely love. GLM 4.6 was released two weeks ago, 4.7 is crushing it on code in open source coding and you can plug it into cloud code. Yep. All right, so this is at the end of the open source. we will shout out our friends from, news research releasing, news coder 14 B open source competitive programming model. Achieve 7% jump on live code bench in just for days of RL training. Let me just shout them out. great friends of the pod news research, doing incredible stuff. tested production c plus plus perfectly debugged code found risk condition. The memory leaks said F Meza. in four days they used 48 Nvidia B 200 GPUs, to 24,000 verifiable problems. in rl, reinforcement learning, very verifiable is the big thing. You will train on something, and you need responses to get generated from the model, and then you will choose the best response. How do you know is the best response? that's where the verifying comes through. And this is why, and Ryan mentioned this before on problems that are scoped in domain like coding, like math, where you can confirm you can run the code and see if it compiles, for example. Or you can run the linter and see if the linter passes. or you can see if the program executes well, that's verifiable. So you can run some sort of, Verifier, to make sure that your model does correct. And so AEL does very, very, good, fine tuning for models such as this. So this is a open Apache two models, full training code and benchmark harness. And I'll shout out specifically the fact that these folks, they also released the Weights & Biases link. So if you go and, we go to their blog, I'll show you their blog, news Coder 14 B, and you can see the model links the code on GitHub, for Tropos and the one B link. And you go to one link. You have this beautiful shout out to Weights, & Masters as the only sponsor for the show, by the way. you have this beautiful, not even a, wins and mass like report, dashboard. This is a Wizard ambassador report that they built. so they created this report. reports are kind of like our blogging system where you can put life, life, dashboards of everything that happens as live or after the run. So you can see these beautiful traces, beautiful plots of, of comparison to Qwen and, and a bunch of other stuff. so shout out to Swiss for this effort, and the great quarter that they have. 14, 14 b coder. I think it's time for us to move on to bigger companies and APIs, because I think there's a lot to talk about there as well.
42:47
So for folks who just joined us welcome to ThursdAI, uh, January 8th, 2026. Can you guys believe 2026? It's Insane. We started in 2023 and now it's 2026. Just imagine
Ryan Carson
Ryan Carson 43:00
what the world's gonna be like at the end of this year.
43:03
I just, ah, can't wait to have that show.
Alex Volkov
Alex Volkov 43:06
I have a robot and I saw on stage we're gonna mention CS right now.
43:10
I saw on stage, in video and then, examples that, Richie Mini, the robot that I have here, it's just the head, the head encapsulated in like a small body, but the body doesn't move. Obviously there is supposedly like a humanoid body that's going to connect to this head at some point. it kind of looks ridiculous, but there's like a buff thing with a small Richie face and it's really real funny. and obviously Jensen, CO of Nvidia, the biggest company in the world by market share, maybe one of the more important companies in the world by what they do, also came out and he had, I think Jensen is cheating a little bit 'cause he had these great keynotes, but now he has these like beautiful Disney Star Wars robots, like on stage with him, and they just like, makes everything he says, like super, super cute. So like, they just like, you know, they react to everything he says and he has like, interactive, robots on stage, obviously. and Vidia leans very heavily into robotics, as well. so I think that's where we start. We're gonna, we can start with Nvidia and CES and, Vera Rubin, which is their new kind of platform. Be Let's share this some screen LDJ. Go ahead.
LDJ
LDJ 44:11
Yeah.
44:12
So Vera Rubin's very exciting. This is the next generation of their. AI focused to GPUs, or at this point GPUs may be a misnomer. Maybe it's more appropriate to call it AI processors.
Alex Volkov
Alex Volkov 44:24
they call this a computer.
44:25
I think trying to simplify this down to we're building computers. This is like the next computer, unlike the computers that we have at home or, at work. These are racks huge, huge racks like of computers. Like stand together. Sorry, go ahead. I should have to,
LDJ
LDJ 44:39
so Vera Rubin is the next generation, the most recent generation
44:43
being called Blackwell Generation. Before that being called Hopper, some people might have heard news about China getting access to some gps that's still in the hopper generation. And so this is two generations ahead of that now. And it's exciting because this is even from what I've seen, like basically in every metric, it's a bigger jump compared to Blackwell than Blackwell was from the past generation. improvements are slowing, energy efficiency, like these improvements are, are plateauing. And it seems like it's almost feeling like an acceleration. And this is actually more than I thought it would be. Alright, so this isn't an image actually, I made, like this is like months ago, and then I updated it with the most recent info. Actually, you could ignore the, the Reen Ultra Part O at the bottom is we, we might not, it's more of like estimated values for the Reen Ultra. Okay. but for everything else, so here we could see, PETA flops. This is just how many operations per second. It can do F-P-F-F-P four and FP eight. These are just different accuracies that it could do that, those operations. But we could just see from, if you go from hopper to the B 200, it's like, that was like about five x. And then when we go from B there, there's multiple configurations here. sorry, it might be a bit confusing. but yeah, if we just look at FP eight. Okay, let's look at F eight. Yeah. So, it's about a double to B 200, and then there's some small improvements when you go to like B 300.
Alex Volkov
Alex Volkov 46:13
Explain what, like, it's a precision of a model, basically,
46:17
and we're talking about the amount of calculations this, GPU or computer can run on models that, at FPA Precision.
LDJ
LDJ 46:26
In a simplified way, you could think of it as eight bits
46:28
of information, but it's how much accuracy of information is within that operation that the computer's doing. Agreed.
Alex Volkov
Alex Volkov 46:36
FP is slowing 0.8 and PF is per flops.
LDJ
LDJ 46:40
Yes.
46:41
Correct. And so it's able to do here, for B 200, for example, if we look GB GB 200 config about five PETA flops, and that's five, I think it's quadrillion. I think PE petta is quadrillion. So about five quadrillion, operations per second at this accuracy B 300. It wasn't, it was more of like a, there's like some bandwidth improvements I think, or, or some, some other things. But it was not that much gain then. But with Vera Rubin, we have this huge over three X gain here. Wow. And it, it's projected to be even larger for the next version. but here, if you look at the power being used here, it's not that much more, the estimated power I think estimates. That's insane. Some of the estimates are like, some say it's closer to 1800 watts, but even then it's not that much more than Blackwell while being significantly greater bandwidth as well. If you see 13 terabytes per second, that's very important for all the GPUs communicating to each other during the training process because, so. It's three times
Nisten Tahiraj
Nisten Tahiraj 47:48
faster while only adding another 200 watts.
LDJ
LDJ 47:51
but like three, a little over three times faster.
Nisten Tahiraj
Nisten Tahiraj 47:54
more than three times.
LDJ
LDJ 47:55
Yeah.
47:56
That's incredible. Really exciting. And keep in mind the B 300 that was only announced to be in full production just four months ago and January 6th at CES Jensen announced that Vera Rubin is now in full production.
Alex Volkov
Alex Volkov 48:09
VE Rubin is in full production, including
48:11
companies that switched to Vera Rubin, like, runway, I believe.
LDJ
LDJ 48:15
Yeah.
48:15
There, there's some debate on what he exactly means by full production. But yeah, that was the exact words, full production.
Alex Volkov
Alex Volkov 48:21
So let's run through the, through the kind of the platform,
48:23
the Vereen platform, six cheap array. Super quick, as we have this. Look, look at this beautiful infographic that I have here with the gpu. Ruben is the GPUs 50 pet of flops of AI inference five x all the time. Vera is the CPU that adds to the GPU 88 Olympus arm course. envy link is the interconnect, bandwidth, processor, which is very important because, between all the GPUs that're distributed in the data center, you have to have a lot of communication, high bandwidth communication because the models are bigger and bigger and bigger. and each GPU works on a smaller chunk of the training process they have to continuously communicate. So interlink is a big Nvidia, plus and ending link six is 3.6 terabytes of interconnect, bandwidth. And then they have connect X nine supersonic ethernet, chip and the Bluefield DPU KV cache reuse KV cache, is very important for I believe, inference as well. So a lot of the model weights are stored on KV C while you run in threats. And then you have another spectrum six ethernet, with power efficiency. this is a whole rack, basically system that they launch. this is the platform. Now this can run on one thing and then they have a rack system. Basically they ship, they don't ship just like one pizza type computer. They ship like whole rack. 72 GPUs per 36 CPUs with a total of 20.7 terabytes of, memory. And, 2.5 x of flaps of training capacity, a hundred percent liquid gold cable free. it's just absolutely amazing the type of like computers that they're shipping. and, we will see basically, the economic impact of this. based on this, chart is 75% fewer GPUs for a 10 trillion parameter mixture of experts training 75% fewer GPUs, 10 x cheaper inference. 1 million to 10 million tokens per second is the same power. So I, I think Nisten mentioned this, LDJ will get, during, in a second, Nisten mentioned the power requirements folks, this is like a big, big, big, big, big, big thing. most of the world is looking for where to plug these GPUs or where the power is gonna come from. So while we also see improvement in performance, the most important thing is to keep the power hungriness at the same level or even reduce power. And I think that the fact that they're doing both is beating Moore's law by significant amount. LDJ, go ahead.
LDJ
LDJ 50:45
I was going to add to that a 75% figure you're saying.
50:48
So when it comes to a real world use case, to maybe help people better contextualize this, Jensen mentioned that if you were to take the deep seek model and scale it to 10 trillion parameters, which in other words means scaling the amount of connections in that neural network to 10 trillion, then you end up having, when you compare it to the last generation of GPUs with the same amount of GPUs, it's able to end up doing the, Training that to a hundred trillion tokens are about a hundred trillion words of information. And a fourth of the time, so about four times faster. It seems like there's truly this real world gain of about four x, which is pretty big.
Ryan Carson
Ryan Carson 51:28
I just wanna say, as someone that spent a bit of time at
51:30
Intel and had a good time there, just how mind blowing this stuff is, what Nvidia is doing, is truly astonishing. There's actually a really good, M-K-B-B-H-D, video that came out recently where he's shrunk himself down to show how amazing silicon really is. And it is mind blowing, you all, like, what is happening here? And the fact they're still pushing is amazing. the second thing I'll say is I think it's hilarious when people say there's an AI bubble. It's hilarious because Jensen, over and over again says the need for inference is unbelievable and it's only gonna go up exponentially. And we all see this. Like, I can't get enough inference. I wanna spin up 10 Ralphs now. we need more inference. So it's exciting to see what they're doing here. and they bought grok.
Alex Volkov
Alex Volkov 52:14
yes.
52:15
so in addition to all this, Nvidia not bought, entered in an exclusive, non non-exclusive licensing deal, but also taking most of the people from Groq with a Q, for I believe $20 billion. So Nvidia is definitely investing on the inference side as well. grok chips, we talked about this, the founder of Groq and the CEO of founder, CEO Jonathan. he was instrumental, if not the lead person to lead the TPU creation. at Google, Google has their own chips. they don't use NDD chips, TPUs, they're really like efficient at scale. Jonathan Ross was, instrumental, if not the top guy creating TPUs. and then, open Grok and Grok had specific inference. Fast ships. We talked about grok. and, they're not that good for training, or at least they're not better for training than the Nvidia chips. But they're really, really fast for inference. Inference is a huge thing. the amount of the harnesses we talk about, the AI use case we talk about, all of this is inference. this is why we. never thought AI is a bubble. We just don't see where it could pop, given the incredible releases from the Nvidia side, but also the need for inference in every possible thing in the world. So, and they also, released a bunch of models, updates Emron as well. I wanna like super quickly mention this, because I think it's very important as well. So besides the Vera Rubin, GBU, and the designs, we also had Nvidia launch Nron speech. A SR Nron released a, sorry, Nvidia released a bunch of Nron stuff. We're gonna mention, this one, and then move on. Nron speech, A SR, which is a 600 million open source streaming model, to redefine real-time voice ai, 24 milliseconds media type, the final latency because it's streaming, and can support up to 900 concurrent streams on single, H 100 GPUs. 600 million parameters is nothing. Folks. We're talking about billions, sometimes trillions of parameters. 600 million can, as we love to say, this runs on a toaster. Hopefully this runs on a, on a Richie Mini. 500 millisecond voice to voice latency. total voice, voice latency. So Nron goes to neutron nano, for the brain. And then the magpie, TTS preview. to show this off, I really wanna open a, video of a friend of ours, but let me, go find the tweet because, I listened to this video and I was like, oh, this is a voice that I know. so we're gonna play this And you see a little, a little hint here. A little cat icon here should give you a hint of what this is about.
AI
AI 54:44
How can I help?
Kwindla
Kwindla 54:46
I was thinking I'd ask you some math questions.
AI
AI 54:49
Sure thing.
54:50
I'd like to know.
Kwindla
Kwindla 54:52
Hmm.
54:52
What's two plus two?
AI
AI 54:54
Four
Kwindla
Kwindla 54:55
and four plus two
AI
AI 54:57
six.
Kwindla
Kwindla 54:58
That's pretty good.
54:59
You're really fast.
AI
AI 55:01
Thank you.
55:02
I'm glad to help. What would you like to ask next?
Kwindla
Kwindla 55:05
Well, how about we switch it up and you tell me a story.
55:08
I have an idea for a topic.
AI
AI 55:10
Sure thing.
55:11
What should the story be about?
Kwindla
Kwindla 55:13
I'd like to hear a story about a tiny seed that wants
55:16
to grow up to be a big tree.
AI
AI 55:18
Once there was a tiny seed named pip, pip dreamed of becoming
55:22
a tall, strong tree one day.
Alex Volkov
Alex Volkov 55:24
All right, so isn't this fucking mind blowing the speed?
55:28
What the heck? Pretty good. I remember, there was a company that later became, another company, but Qai, had Moshi and Moshi was almost responding before you could, you can think Moshi was a, a model that has a voice built in. You remember this? Like, it was like uncanny, you speak to it like replies before you ended up your, your, your sentence. Moshi was one model, and the benefit of one model was always like, Hey, the speed of one model will always beat the three model pipeline of a, text to speech model. Then an LLM and then a and, and then, sorry, a SR model that understand what you mean and, turns it into text. And then an LM and then speech to text on the other side. this is three models. One of them is this Nitron Ultra, and the other one is Mac by TTS preview with like a super quick, text to speech, model. And this has less than 500 overall latency, 500 milliseconds. and as a reminder for folks, anything under 250 milliseconds, under the 200 milliseconds is essentially instant for us humans and the, the way we perceive the world. And so when Quin la in this case, okay, this is the big reveal. Quin La Kramer from DA and Pipe Catt is the guy who Nron, showed off on stage, shout out to Qwen, a friend of the party, the basically expert in everything, voice, ai. I was really, really happy to see like Pipe and Daily, trained in this. when he talks to this model and asks What's two plus two? And, and it says four, it says four immediately. What needed to happen for this process to happen is like his voice was transcribed, the text was sent to a Emron three nano, model that runs somewhere on Nvidia. And then, it created a response, and response was text turned into speech. So shout out to, to Nron speech. A SR. They did this, with a very specific, Architecture, cash aware, fast, conformer, RNNT Architecture 24 recorder layer. Cash aware, is the big thing here. which is absolutely incredible. So voice agents are gonna come everywhere. and this is like, yeah. Incredible speech model. All right, so this is basically Nvidia. There also is a bunch of open source model as well. Anything else we wanna mention about cs? Go ahead. Luge.
LDJ
LDJ 57:34
Yes, for Nvidia, there is something that they have already
57:37
released that they announced at CES, which is Alpha Mayo, their reasoning, self-driving model that they announced. Hmm. I don't know if you heard about this. No, I haven't told us. Nvidia announces Alpha Mayo family of open source AI models and tools to accelerate safe reasoning based autonomous vehicle development. And so they even showed a video at CES of this doing like a full end-to-end autonomous drive integrated, I think it was in a Mercedes-Benz and this, yeah, this open source self-driving model. and it's, it does these reasoning steps like, oh, identifies, I have identified a jaywalker. I must stop right now. and yeah, it's really interesting.
Alex Volkov
Alex Volkov 58:19
I don't know if we want the reasoning in my model that drives,
58:22
decisions to be made fast, but,
LDJ
LDJ 58:24
true.
Alex Volkov
Alex Volkov 58:24
it's an interesting concept.
58:25
Some ethical, ethical decision making that I'm not sure I'm ready to like hang hand over. but I, I should mention most of my driving happening autonomously in, in the Tesla line. that's great. So like seeing advances elsewhere? definitely worthwhile. they have a bunch of announcements with robotics, integration with the Haken face robot, arm, bringing this to the group, framework, and library. So a huge CS for Nvidia for sure. and I think that we will move on in the big companies, because I think we need to get to Ryan showing us Ralph, at some point. So we're an hour and 15 into the show, and I think there's a few more things that we need to cover. and then we will have, we'll get Ryan, I know, I keep teasing and trust me, it's gonna be awesome. one thing I wanted to chat with all of you as a chat about this for the next five minutes. grok and XAI fundraise so. We've talked about XAI, obviously the speed of implementation of theirs, GPUs racks and everything is incredible. XAI fundraised another $20 billion with NVIDIA as one of the folks who, joins this round. And, steps on group Fidelity management, Qatar Investment Authority, MGX and Baren Capital. strategic investors in the round, are NVIDIA and Cisco, who continue to support XCI in rapidly scaling the computer infrastructure build out. it's very interesting to see Nvidia investing in companies that then buy GPUs from nvidia, the infinite money Glitch this comes during a very interesting incident. Let's say, that grok has an image model as well, in addition to the language model, that language and image models both integrated into the X platform to that fact. When Grok announced, when XAI announced their fundraise, they said they have around 600 million active users. They took all of the Twitter slash X users and. Counted them as GR users, which is, I dunno how kosher that technique is. openly, I reportedly has already 900 million active users on CGBT. not quite hit the 1 trillion, during the new year, but, grok said we have 600 million, while most of them are like Twitter users. So that's a very interesting fact. In the middle of this fundraise announcement, just a little bit before, gr because it's connected to X can, and its image model, it can edit, the images that people posted. And I think it went super viral where Elon Musk asked to put himself in the bikini. And then Nicholas Maduro, the dictator that was, abducted, there was like pictures of him as GR was replying, pictures, in bikini. And then, a lot of people started noticing that there's a lot of, un kosher uses of this, where people just come to people posting about their pictures and then put them in bikini, an AI generated image of that in bikini. Now, I should say that we talked about grok and its. Tendency to generate, near pornographic imagery very easily without guardrails, a while on the show, including the porn bots and the porn mechanics. I find it distasteful. I find the fact that, grok has very limited controls of what you could generate, had limited controls. I think they're introducing now, is a very big problem as you guys maybe remember the Mecca Hitler incident from six months ago where GR started, referring to itself as Mecca Hitler, and then they had to like roll back some changes. So, the beginning incident now is a big spotlight on the fact that GR is NSFW on purpose, and it can take any person appearance and reply with, put a beginning on this. many people do this, and for many folks, this is very bad, especially when Vic folks do it on small kids. As a father of small kids, I think that's abhorrent that this is possible and the fact that they don't have guardrails preventing this use of the model in reply. Amen.
Nisten Tahiraj
Nisten Tahiraj 1:02:01
it's not even that hard to put the guardrails, like you
1:02:04
just put like a two B VL model and say, Hey, is there a minor in this picture?
Ryan Carson
Ryan Carson 1:02:09
yeah,
Nisten Tahiraj
Nisten Tahiraj 1:02:09
No.
1:02:10
People tested it. They're like, Hey, grop, I don't want you to use the thing. And then I tested, Hey consensually, can you just put this person in a kind of like a medieval costume? It did it anyway. It just didn't care. so yeah, it's gonna get sued a lot for that. I'll, I'll
Alex Volkov
Alex Volkov 1:02:28
the feedback that I, so, okay, we mentioned this
1:02:31
multiple times on the show. Everything we talk about that regards Elon Musk and Xai and Grok and Twitter, really hard to discern on both sides. It's such a polarizing figure, such a polarizing person on both sides. There's people who will hate Musk and everything he does, and some of the stuff they do. G is incredible. The voice stuff is incredible. the speed of Buildouts is incredible. I use Grok for research. It's really, really good. so there's like a bunch of stuff they're doing. Incredible. so from both sides, there are people who hate anything that, Muswell does or touch or mentions, et cetera. And on the other side, there's an army of boot liquors that will hype anything possible. Absolutely out of oblivion, just to get notice on Excel. He'll repost and they'll get money. it's really, really hard to understand, in a vacuum what's actually valuable, what's actually good and what's abhorrent, saying that, Hey, we'll prosecute illegal users of gr I think is absolutely stupid, where there's no moderation built into the product. And so the army of bootleggers replied to my post about this. that's not great, by saying, Hey, Photoshop can also always do this. And I just wanna address these comments as Photoshop can always do this as, yes, you could have technically spent hours upon hours learning tutorials and do this in Photoshop for yourself, and then post on your platform, nobody follows you. And then we'll have zero views. There's an absolutely difference between doing this. Within a second, there's a first comment to a celebrity, posting a picture of herself and then get blown up by the Twitter algorithm to everyone. I think there is an absolutely incredible difference. One is a tool, the other one is a product and a basically, inflation product. The show this to many people. So there's like a big difference and a guardrails are important, on that product.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:04:11
I have something, co maybe a bit controversial in that case.
1:04:14
Go ahead. So what is, what has been done putting miners or women or men, it doesn't matter in by bikinis or anything that is, And hurting the people. That is, something that has to be stopped. And that is also something where we should go after the people, not just the product. On the other hand, like I have uploaded pictures of myself and wanted to put me in some specific situation where Google, in Germany did not allow it because yeah. we are the L of for Boden. A lot of stuff is not allowed and there are harsh penalties. And, later it was possible and I enjoyed it. I created pictures of, Fantasy images with my kids for the kids. So I showed them and they loved it, stuff like that. If it is not allowed at all, that would be, troublesome. So we have the watermarks and we have laws, and if something illegal is being done, I think we should go after the people that are doing it. While it is much more difficult to make a product like, powerful image editing model that happens to open source as well, where you can't even establish such guard rates, so should it be sanitized completely and then you can't use it, for example, that, the put yourself in clothes feature where you are going virtual shopping. So if a woman can't upload a picture of herself and have it put in a bikini if she wants it, then that would be a bad product. But, being abused in a way, I think that is where the moderation, the filtering and stuff, has to happen more because we have to find the people. Like you said, if there's a viral post and somebody posts and, horrific image, then we know who the person is and we can go after them block the account Do anything people get banned for much less Arie stuff. Yeah. I think there needs to be some, responsibility for the people and put the blame on the people that do this because AI is a tool and we wouldn't want Photoshop to have some filters that say, okay, you can do that even if you have valid reasons to do so. So we, if we had AI and these tools and, and this mindset, we probably would have much more filters in the tools. But I think we should, stop people or go after the people that do the bad stuff and not go after the tools and make the tools even less, I.
Alex Volkov
Alex Volkov 1:06:20
I think valid opinion.
1:06:22
The only thing that I would say in addition to this is I like how foul.ai also friends of the show, how they approach this foul hosts and serves a bunch of open source models for video and the AI image generation and specifically Foul is an API product. So you would go to via an API, by, you know, reaching out to them programmatically. So you can build whatever you want. If you want some NSFW stuff as a person you wanna build on top of the platform based on the models that they serve, like you said, AI as a tool, then you can absolutely do so when you go to file AI itself, they have like a little playground to play. If you don't wanna start writing code, you can test out different models that has an NSFW filter on prompts and on responses. They have a categorizer on the product itself. 'cause they don't want their platform to be conserved. The porn platform. And this is absolutely valid because many of the models that they serve can, generate nudity, can generate NSF W. So they have a very distinct, if you are on our website, we are not enabling you to disable the NSFW filter. We're not enabling, if you want to do this, do this via API and this is based on our like whatever legal. So this to me is the difference between a tool and a product. A tool. Absolutely, I'm with you should be open. People can build whatever they want, and they've been possible to do before. And AI can like help creatively and people can build tools like put products on yourself when it's a product and it's a product that many people see without control of the kind of the person itself. I think that's where I have a very big issue with this. And, x gr is absolutely turning into product. It's getting shoved into people's faces. There's a button now in the middle of the, XAI platform. There's like, the middle button is grok and each response is, Hey, grok do this. That, so this is where like I have a problem with, there's no guardrails on that. Alright folks, we're moving on. This is gr and Trouble at 20 billion Fundraise. super quick. Alexa mentioned Alexa Plus is the Smart Alexa experience. I don't know if you guys got enabled. I got enabled. It's meh. So the main improvements in Alexa Plus that we waited for so long for is, hey, Alexa is now smarter. the cool thing about Alexa right now is that if you tell Alexa something, she would reply. you still have a little bit of a faint blue light on the Alexa device itself. You can continue talking to her so you don't have to say Alexa again, and then she would reply again. So that's the benefit. it's integrated with the Smart Home, which is a, I think a big boon. me and me and Wolfram, our big smart home aficionados. Signed into my account. it's integrated into the smart home and, this is the demo that I want to show you guys right now. So we're going, gonna go to Alexa. This is my user, and I will show you a demo of Alexa Plus on the web. 'cause I am one of the lucky few that have it. so you can do stuff like plan and learn and create and shop the Alexa. Plus, free flowing, easy conversations, get it done with Alexa. Plus, Alexa can summarize docs, schedule, services, shop products and more. start conversation on one device, continue on another. This is, I think, a cool thing. You can continue talking to Alexa And, Alexa is now Chat GPT looking basically, and you can say, turn on office lights and ta, office lights are on. So this is me chatting with Alexa. Alexa obviously connected to my smart house and it has, lights and, I don't have an actual physical device in my office. It's in the kitchen to set timers because that's all usually Alexa is good for. but I found this really cool from a web interface turn off. Off. Oh, turn them off again. So you don't have to mention office slides. This is like natural language conversation. It should, it should naturally do this.
LDJ
LDJ 1:09:49
model?
Alex Volkov
Alex Volkov 1:09:51
no.
1:09:51
I don't know which model they use.
LDJ
LDJ 1:09:52
one point they did say that they were partnering with Claude and going to
1:09:56
have philanthropics model be used to help you better interact with Alexa devices. But then Amazon later came out with their own line of frontier or allegedly frontier models, and so it's, I think that's gonna be interesting.
Alex Volkov
Alex Volkov 1:10:09
Yep.
1:10:10
so yeah, they have Nova and a bunch of other stuff. we, we, we don't know which model and they're not exposing this. I just thought it's like the smart home thing is, is very cool. All right. So this Alexa plus it's on the web. They announced a bunch of other stuff also on cs, different devices. breaking free from just the, the technical device. You can run Alexa on, on every month and 20 bucks a month. text chat interface only, voice coming later on.
Ryan Carson
Ryan Carson 1:10:32
I think speaking of, devices, Richie is just not awake.
1:10:36
You need to wake him up. I'm very sad. Richie.
Alex Volkov
Alex Volkov 1:10:38
come on Richie.
Ryan Carson
Ryan Carson 1:10:39
Hello guy,
Alex Volkov
Alex Volkov 1:10:41
Yeah.
1:10:41
Richie is going to sleep at some point. let me try to get him, Richie is the robot. Let me wake him up.
Nisten Tahiraj
Nisten Tahiraj 1:10:47
Richie had a rough night out.
1:10:49
Richie had a tongue.
Alex Volkov
Alex Volkov 1:10:51
I will say, just as we were joking about Richie, I had,
1:10:54
cloud code connect to Richie and, Richie is basically the heaven ISS decay in Python that you can, program pretty much everything that you want. I had tried to add the BPM detector for Richie to just randomly dance when I have music in my office. it's harder than it looks folks. It's harder than it looks. so I will, wait for Richie to wake up because Richie does not wanna wake up. I do often need to restart it. let's move on to, somebody did ask if Richie can be talked to via Alexa. I haven't connected those yet. I probably should.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:11:23
Yes.
1:11:23
Have you seen mine already? I modify mine. Wow. So it's my atheist in a Richie form.
Alex Volkov
Alex Volkov 1:11:28
That's hilarious.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:11:29
It looks like an accident on Z Street
Alex Volkov
Alex Volkov 1:11:32
this is so cool.
1:11:32
We need to get them to talk to each other. all right folks. Nisten, I would love to talk to you to talk about, GPT Health super quick and the doc stuff, and then we can get to Ryan and talk about Ralph, because I'm very, very excited about this. but the GPT health launch was very interesting, because many of us turned to GPT to ask about our health issues. and they're highlighting this as a as, as a, as a feature back in, G PT five launch. They brought some folks who like cancer survivors, and they talked about how GPT five absolutely helped them with some diagnostics, second opinions, et cetera. but many people say, do not trust AI for medical stuff because it hallucinates. so where are we in the world of health and how has the GPT Health launch, affect that?
Nisten Tahiraj
Nisten Tahiraj 1:12:10
I, they have been fantastic for the last two years.
1:12:14
the main issues have just been people not selecting the right model or, not just using the best model, or they would use like the mini sometimes, and that would hallucinate. But if you just use the best Pro version since, basically since GPT four, it's been very accurate with health stuff. And, what most people don't realize is that they think of medical data. And this stuff, like, it's a lot, but it's actually not a lot at all. For, for the model, there's only about like 2000 something prescription drugs. There's only about 2000 or so total diseases. If you count the rare diseases and stuff, it goes up to 10,000. And even the total amount of codes, that the, that are classified international only like 20,000, that's nothing for an LLM, but there's not actually that much data for it to worry about. And it's very easy for an LLM to pick up patterns when you just need about 2000 or so. diseases. Like, it's actually not that much at all. this is the most common misconception that people have, because they're always used to just thinking about, oh, insurance companies, all this medical data, all this junk. But it, once you filter it and once you classify, it's actually not a lot at all. this is why the models are very, very good at this. a lot of this happened during COVID. So they were able to just get all of this data, get all the nurses, all the scribes, all the stuff. And because there was a health emergency, they were able to actually put them all together without too many people getting in the way and creating a lot of fake work for themselves. this is what actually allowed the models to be very good. if you ask most doctors, they love the models. they love having them. Their complaints are mainly, sometimes it just takes too long for them to respond or to do something.
Ryan Carson
Ryan Carson 1:14:08
Mr. I have a question for you.
1:14:10
when do you think MyChart slash Epic is gonna play ball here? 'cause that's obviously the big question. They're only using them.
Nisten Tahiraj
Nisten Tahiraj 1:14:17
almost every d just uses dictation.
1:14:20
They've been using GPT four, Norway put out Epic with GPT four, vision for, doing all the x-rays and stuff. So in Norway, since two years ago, when you get an x-ray or a CT scan, the first respond you get is actually from a bot. from an LLM.
Ryan Carson
Ryan Carson 1:14:37
I'm more talking about getting their data into
1:14:40
my Chat GPT instance, right? Because right now I have my chart, I have to log in. It's like totally locked away. I can't get the data. It's a total freaking nightmare.
Nisten Tahiraj
Nisten Tahiraj 1:14:48
it's become a lot easier because you can just take
1:14:50
pictures with your phone and then you just dump them into Chat GPT, This is how people use the models for health advice all the time.
Alex Volkov
Alex Volkov 1:14:58
People use the models for health advice all the time.
1:15:00
And JG Bt absolutely, open the Eye is encouraging that use. But now I think this is, there's a, this announcement is very interesting, kind of like update about this,
Nisten Tahiraj
Nisten Tahiraj 1:15:09
super supportive,
Alex Volkov
Alex Volkov 1:15:10
Open the Eye Decides to add a privacy first space for personalized
1:15:14
health conversations with connected health records and fitness apps. So I think that this is the thing, many people would maybe like to, have, their records, analyzed instead of just spacing, screenshots, and have open the icon, like approaches differently. for example, apple Health. I'm very much looking forward for Apple Health Connection. on the Chat GPT interface where my sleep patterns and my HRV and heart rate, like all of these things will be synced to Chat GPT and helping me, function health, functional health is great. I believe functional health is the, the lab that takes. Tons of, biometric and blood tests and gives you like a incredible panel and, the integration of functioning health Peloton and MyFitnessPal for folks who are like, dieting specifically, and then electronic health records, which I'm not sure what that is, but I think Ryan, is that what you're looking for? Electronic health records that are stored and then you can like access them like very securely?
Ryan Carson
Ryan Carson 1:16:05
Yeah, like very simple example.
1:16:07
My wife just had an MRI, it like, this is insane. We have to call the hospital and say, can you cut a CD of the images so that we can see it? And we don't even have a, I don't have a CD rom, like, I don't even how to,
Alex Volkov
Alex Volkov 1:16:18
I, Ryan, you don't have a CD rom.
1:16:20
I think that half of the people who are listening to us right now, half of the 1500 or so, people who are listening to us have 'em, don't even know what a CD is or what cutting means. What, what does cutting a CD means? I think that we're there and you, like, they're still operating in that like, you know, all this technology. Hundred percent. so Nisten has a CD to show us. Look, got
Nisten Tahiraj
Nisten Tahiraj 1:16:43
I'm this old guys.
1:16:45
This what? This's what a CD Looks like this C You're not, no, you're not old. No, I, I am this old. I have the original buntu ones. Oh, wow. And, yeah. Wow, folks.
Ryan Carson
Ryan Carson 1:16:54
No.
1:16:54
Installing real software is using, you know, three and a half inch, discs flopping this disc
Alex Volkov
Alex Volkov 1:17:00
Go ahead.
LDJ
LDJ 1:17:02
Yeah.
1:17:02
On, Chat GPT health, I think it will be especially huge if I'm looking forward to putting in the, like the blood work and lab results and, and things like that. They said that they'll be able to support uploading of those documents. I don't know if it'll be like a special upload button or something for those things, but there's theoretically just so much data in your blood work and lab results that theoretically there's such a multitude of diseases or conditions that it could diagnose where there's so many different sub areas of biology and medicine where just the average doctor, even some of the top doctors wouldn't be able to keep that all in their mind and really cross reference all the latest literature regarding all those biometrics.
Alex Volkov
Alex Volkov 1:17:46
I think specifically function health is a great example of this.
1:17:49
They have 160 lab tests. I'm not paid for function health. I didn't even use this yet like I really wanted to. But, the thing that I read about function specifically is that you get 160 lab tests, whatever. Many doctors can even diagnose this for you. Many private care doctors, you would come to them. So function doesn't actually do the analysis for you. They give you the results in biomarkers and say, Hey, you're in this range, whatever. But if you want a full scale analysis, a doctor needs to interpret them. and the reason why I didn't like jump on function is that many doctors you would come and they would look at you weird and like, no, they would not want to interpret this because these are not blood tests that they ordered for you. And usually the way it works is the other way around. The doctor sees you diagnosed you with a problem and then orders, test based on the problem that you have. Function works differently. Function, just test everything. And then if there's like outlier function would like show you. so it's very interesting the connection with function health, with JGPD because C GBT would be able to tell you like, Hey, this is possibly going on. You should look at this. I think we're over the medical stuff. There's the other one is tro Nisten. Again, we'll tap you as our, potentially health resident by the way, folks, for context here, Nisten worked with a bunch of doctors and has been part of the first, AI doctor when GPT three was just around and there was like a first, like a Dr. Gupta, if I remember this correctly. So yeah,
Nisten Tahiraj
Nisten Tahiraj 1:19:05
I also wrote a paper with the University of
1:19:08
Washington and, Dr. Johnson Thomas there for, on device medical ai. And, so yeah, I pushed for a lot of both on the closed source and open source site made some of the more popular data sets, because no one else was making them. when you think about how much each government spends on, on healthcare and education is to, there's two thirds of their budget. And, the, the sad part is that they try to focus on all of these DOR scenarios and not on what actually doctors and nurses are saying. Doctors and nurses are saying they're overworked. and governments are just saying a whole bunch of stuff about, safety, which is not even the work that they actually do. Most of the safety benchmarks, they just don't work at all. It would be a bit better if, governments considered a huge part of their healthcare budget to actually just buy GPUs. Mm-hmm. Otherwise, they will be dependent on sending all their data to San Francisco. that's just how that is. And, again, they should look at it. There's no shortage of work for nurses and doctors. Now, if anything, they're overworked. So they should push for as much automation as they possibly can. And even the open source models now are excellent at the, at, at medical stuff as we've tested quite a few of them. So you can actually just run all of these on, on GPUs. It would be a very marginal amount of cost added to, to, to any hospital. So there's, there's no excuses for doing this now, and I, I would even make the argument that it is ethically wrong to de to delay the process and create, again, create like fake administrative work, for themselves, whether it's people, well, like er wait times, just keeping increasing and, doctors and nurses keep getting more and more o overworked.
Alex Volkov
Alex Volkov 1:21:03
light of that conversation, the topic of TRO is this is the
1:21:06
first like mandated, state approved, medical prescription, pilot where in an AI can review your history. patient requests. a renewal. Renewal means I have been on this medication for a while and I need the new one. And usually you have to see a doctor and doctor is busy and tired and whatever. And this is like 24 7, runs super cheaply, significantly cheaper than a doctor. a renewal can be done by, I think a nurse practitioner as well. And correct me if I'm wrong, but like, it's basically very hard to get renew. This is ai, reviews of history, ask you clinical question whether or not you have like about reactions, whatever the flags, safety stuff, and then, sent to a pharmacy like, Hey, this person can still do this. they excluded pain management drugs and A DHD medication injectables. So stuff like, people can abuse, but generally if like people have asthma medication they need to review, this makes absolute sense that AI will say, Hey, yeah, you seem like you can continue using this medication. and, first ever AI malpractice insurance is built into this. 60 million Americans are affected by physician shortages. So like. It's so stupid that to renew the drugs that you depend on every week, you have to talk to a person. And hopefully this will just expand Nisten. I absolutely agree with you. It's ethically and, morally probably important to lean into this technology versus the otherwise, because the shortages and doctors are sick. And we need to get the doctors to do diagnose and not do administrative stupid work like renewals. so we're all for this. Shout out to tro, for Utah program where AI can autonomously renew subscriptions for you. 190 routine medications, at just $4 per renewal. This is incredible. so shout out to this and then we'll hopefully to hear, we'll hear more about, medical stuff, coming through. We've been teasing long enough folks. What the heck is Ralph? What is Ralph? I basically took a little bit of a break over the holidays, even though ThursdAI was sent and you guys got a great interview with Will Brown from Prime Intellect. I took a break from Twitter. I proposed to my girlfriend. I'm not engaged, and I was like, Hey, I don't wanna look at Twitter for a few days. And when I looked at Twitter afterwards, after a few days. All I saw is Ralph Wiggum going like this to his nose. Ralph Wiggum is obviously a character for those who don't follow Simpsons, which is still on air somehow. so Simpsons Ralph Wiggum is a character, known for, it is very blunt stupidity, but achieving results, repetition or some such thing. and when I looked Ralph and I saw Ryan Carson blow up with an article about Ralph and what Ralph is, and I thought, who is the best person to talk about Ralph on the show? and so the question, the direct question to you is, Ryan Carson, what the heck is Ralph Wiggum?
Ryan Carson
Ryan Carson 1:23:40
What the heck is Ralph Wiggum?
1:23:42
All right, so I'm gonna walk everybody through like, what is this thing and why do you care? Like, how's it gonna help you a lot, show, I've shared my screen. So, if we wanna pop it up, I'm gonna walk you through a couple things. So. Okay. So, a friend of mine, Jeff Huntley, he, used to work at amp. He created this idea of Ralph, a while ago. I'm thinking it was at least six months ago, maybe more. And, the idea is very simple. It's basically an agent that runs in a bash loop. Well, what does that mean and why does that matter? well, I thought I should try this 'cause people are talking about it. And so I tried it. I set it up in amp, I got it working and at the end I was like, I should probably write an article about this. It's at 1.2 million views. Wow. So I was like, wow, okay. We hit a nerve here. So what is it? I'm gonna walk everybody through. if you could switch to the slightly larger version
Alex Volkov
Alex Volkov 1:24:29
screen.
Ryan Carson
Ryan Carson 1:24:30
Okay.
1:24:30
So what I did is created a simple, repo for this. So if you're watching, you can see just, this is open, it's free. just go to github.com/sna tank slash Ralph. this will just help you boot it up. Just tell your agent to go look at this to help you set up Ralph. so I've done that for you. If you're, listening, once again, it's just github.com/sna tank slash Ralph.
Alex Volkov
Alex Volkov 1:24:53
tank.
Ryan Carson
Ryan Carson 1:24:53
That's me.
1:24:54
I love that. okay, so now, but how does it work? I've created a little visual for you all. Okay. So say you're coding with an agent. you wanna build a bunch of stuff, but you're busy, you've got kids, you know, you've got stuff, you've got a job. You would love your agent to build stuff for you while you sleep. Well, how, how do you actually do that? Right? Because normally you need a human, the loop and you're approving things, or you're looking at code, or you're doing stuff. Well, the, the truth is models now, especially with Opus four or five agents, are basically able to accomplish a lot of what a junior engineer, even a mid-level engineer could do, with basically no input. they are very, very good now, they're not senior engineers yet, but they're pretty good. So you could ship a pretty. Deep, interesting feature, why you sleep using Ralph. So how do you do that? it's pretty simple. Okay? So you start off and you write what's called a PRD or a product requirement doc. Now this sounds hard, but it's not All you do is open your favorite agent. I use AMP obviously, and you say, I wanna build a feature. and I use Whisper flow. I just talk for a while, probably two, three minutes. You know, I blab into the text field. And then, I say create a product requirement doc for that. and it spits out a pretty good PRD with user stories. So user stories are a simple task that engineers have been using for decades, which is, as a user, I want to do X, Y, Z, right? So as an admin, I wanna log into the admin and click a button and have it save something, right? So that's a user story. so you write your PRD, just use your favorite agent to do that. That's not rocket science. This is where it starts to get interesting. We're going to take that PRD and we're gonna convert it to a list of user stories as JSON. js ON is just a text file. it's, it sounds fancy, but it's just a text file in a certain format. So what you're seeing here is an example of a user story. it's called add priority field to the title. normally there would be some text about what is, the user story and then acceptance criteria. Now, this is the part I just, I want to like bang and bang and bang on about is that Ralph is amazing. Agents are amazing, but they're not gonna work unless you take time to specify what it is you wanna build. I usually spend 30 to 60 minutes on writing the PRD and converting it to the user stories. It's, it's like really important. So what you're doing is breaking down that PRD into very atomic user stories that have clear acceptance criteria. What is an acceptance criteria? All it is, is how does the agent know if what it built works? That's it. And so as you're building this, Jason file, again, you're just chatting to your agent about this, you wanna say over and over again is each user story atomic and can it be done in one simple thread, right? Is it very, very clear how we're doing acceptance, testing here? And you need to think about like, can the agent actually do the acceptance test? So if it involves browser testing, which is very important, you need to use a skill, I use a skill called dev dash browser. It's open source. It fires up chrome, in debug mode. It's able to control it. It's able to see it, it's able to understand it, and it can understand if it shipped a front end feature, right? Okay. So we've got, so far, we've written a PRD. Yay. we've converted it into a JS file, which is just the list of user, stories with acceptance criteria. Neat. Again, this isn't rocket science. I think people have been doing this forever. Now, what do we do next? We run Ralph. So what is Ralph? All Ralph is, is a bash script that loops the agent. So what do I mean by that? Well, again, your agent, like amp can write a b script for you. All it is is a file that your computer can run and it basically says, okay, I'm gonna do something on my computer and then I'm going to finish and I'm gonna loop that. So what is it doing and why is this different than like opening up cursor or opening up
Alex Volkov
Alex Volkov 1:28:53
mm-hmm.
Ryan Carson
Ryan Carson 1:28:54
And doing it.
Alex Volkov
Alex Volkov 1:28:54
Hey Ryan, what, what is the difference between
1:28:56
running Ralph and running Cursor and Amp directly typing into it? Okay. Continue. Do the work that you do?
Ryan Carson
Ryan Carson 1:29:02
Yeah.
1:29:03
So the difference is that, you are not gonna be involved.
Alex Volkov
Alex Volkov 1:29:06
Yeah, that's great.
Ryan Carson
Ryan Carson 1:29:07
So the whole reason to do this is that you're like, I
1:29:10
wanna build this big feature, but I want it to happen while I'm a FK. I don't want to be in the loop here. I want the agent to do all of it. And you normally can't do that because the agent runs out of context or it runs into a wall. You have to get involved. And so Ra Ralph solves that, and I'll explain why. So you basically have done all the work to write a PRD to create all the user stories. You go to your terminal and you start the, the Ralph script. it's just a bash script and then your agent, what it does, I use amp or you could use cloud code or, cursor on the CLI or Gemini on the CLI, any CLI, agent. And the bash script tells the Agent Amp to pick the first user story, right? So it says, I'm gonna go to this JS file and I'm gonna look for the first user story that doesn't have, pass true. So it looks for pass as equals false. Okay?
Alex Volkov
Alex Volkov 1:30:02
Mm-hmm.
Ryan Carson
Ryan Carson 1:30:03
Neat.
1:30:04
All right. Well then it just starts to write the code, right? nothing rocket sciencey there, but it's happening kind of behind the scenes. Like you're not seeing all this happen, right? It's happening, in an amp instance. and then it commits to change. So basically what happens is it goes through its acceptance criteria and says, oh, this works cool. I guess it works. All right, I'm gonna commit it. And obviously you want your agent to commit this stuff, 'cause you know, bad things can happen and you might need to go backwards. So. It commits changes using GI obviously. then it updates this PRD, right? It says, okay, I'm gonna mark passes as true. Yay. and then this is the most important part. everybody is talking about compounding engineering and compound learning. And what does that mean? All it means is that you take what you learned and you write it in a text file, right? So there's a couple things, in the bash script, it says, if you learned anything during this loop, write it down in the agents MD file. so look at any files you edited and any, you know, dead ends that you hit, or any rabbit holes you went down and update the right agents at MD file so that we don't do this again, right? So that's huge. And then the other thing is it's got a simple text file called progress txt. All this is saying is, okay, here's what we did during this iteration. Here was the amp thread that we used in case you need to refer back. Here's a couple things that we learned that aren't, we don't need to remember forever, but we probably wanna remember while we're building out this feature. So it writes that file, and then guess what? It loops, right? So then it looks back at this Jason file and says, what's the next user story? Pick it. If yes, then it loops, right? So this is the Ralph Loop, and this is sounds dumb because it's so. Simple, like, why is this, why is everybody talking about this? It's because actually this hasn't been really done, outside of like hyper advanced Silicon Valley circles until now. And this is transformative, right? You're basically taking a highly, model, like an Opus four or five. You're giving it clear atomic user stories, and then you're letting it loop, until it's done. So then you finally get to the end of like, okay, is there more stories? No. And it's done right? And then you wake up and in the terminal it says complete. what you're gonna do is do a little bit more user testing yourself. Like you're probably gonna fire up a browser. you're gonna test some things. You're gonna find a couple bugs that got missed, and then you're gonna talk to amp or cloud code and say, let's fix it. And then you're gonna ship, that, that feature. in the past what you did is you were an engineer on a team and you'd have a sprint, right? And in that sprint you would have a bunch of user stories on aban board, and every day you'd hop up and you'd look at the command board and you grab some, you grab your first user story 'cause you're like, I can work on that. I know about that. I can do that. I, I can do it without touching other code. You pull it off and you do it. Then you finish it, you walk back to the board, you put it on, you mark an X through it, and you pick your next user story. This works. And guess what? It actually works with a bunch of agents, because they don't need the whole context, the whole repo and all the knowledge ever. it's really exciting. And so I shipped two features yesterday using this. I shipped three this morning. it's very, very exciting. and it's fun to see. It kind of hit the main, the mainstream.
Alex Volkov
Alex Volkov 1:33:27
Yes.
Ryan Carson
Ryan Carson 1:33:28
I wanna give, you know, Jeff deserves the credit for thinking this up.
1:33:31
and he is kind of brash and bold and I love that about him. And he's been saying this for a while, and I think finally, we're all paying attention. So exciting times.
Alex Volkov
Alex Volkov 1:33:40
thank you for this very detailed explanation
1:33:43
of what the heck Ralph is. let me try to recap, to see if I understood this correctly. The difference between me going to an agent and asking for a feature and babysitting these features that this can run autonomously. The reason why this can run autonomously is that you pre-ID the work instead of continuously telling hey, and now I want this, and now I want that and I want this, you said by yourself. And instead of being a software engineer, you were a product manager In your head together with an LLM, you said, Hey, I want this. Then you broke down the this into smaller and smaller and smaller and smaller, achievable units of things that need to be done maybe with dependencies. Maybe this needs to be done before that, et cetera. and then you use the NLM to help you define what means that this is complete. and I think the crucial thing in this like loop is that it updates the progress itself. So next time it starts looping, it already had both the context from the previous feature complete and the status of the feature complete. Here's what I need to know. And the context is kind of like building up from scratch. what we know from AI agents is that the longer you sit in chat, the stupider, the thing gets even CloudOps with cloud code, right? If you are getting to a point where it needs to compact to history and you're getting to a point where like there is yeah, you iterated on a bunch of features, it became stupid at this point. It's like it's already forgetting a bunch of stuff. So if you're not committing continuously to like, here's what you need to learn into that agents and you're not updating the status, this will not work. But the autonomous part becomes from you, you did the pre-work, you kinda extracted the post-work of what people usually do in Vibe code. this is how software has been built For a while and just works for agents in this row thing. is that pretty much that summary?
Ryan Carson
Ryan Carson 1:35:17
You summarized it pretty, you did it better than me.
1:35:19
and I, I think people think, well, why do you have to, to specify the user stories? It's like, because that's how you build stuff. Now this will get abstracted and they'll be, you know, a CPO agent that does this for you at some point. Yes. but that's, it's not good enough right now. and it really, really works. And people complain about short contact windows. Y'all like the, the contact window of Opus four five is plenty to ship a decent user story. and if you're trying to stretch it out, you're doing too much, period. I see Nisten nodding his head. Yeah.
Nisten Tahiraj
Nisten Tahiraj 1:35:52
the dumber you make it, the better the results are.
1:35:58
So basically what Ralph, you just say, Hey, read the PRD, which you wrote. you want to keep that somewhat simple and then you just say, pick the top item from the to dos and just do that and then continue until all tests have passed. But what really enabled Ralph to actually exist were mainly the AMP commands and the cloud dash P. So it was the inline, bash commands because if you do Claude Dash P and you put a command in there, it will just start running Claude for one session to just do that task. So all you do with the Bash script is you just grab the initial instructions, which are just really simple and stupid. Usually they're just four lines. They're like, yeah, just read the PRD, do the top item from the Tuto list and, make sure to commit everything nicely and continue working until all tests are done. And then it starts doing this over and over and over again. And if you're feeling adventurous, you can say, Hey, just start up to 50 sub agents every time you do this. this was Jeff Huntley's favorite thing. the reason it works so well is because when you do it as a single line command, it just does one of the tasks and then it stops itself and then starts over again. So it just does one of the tasks, does one commit, and then it stops and it just keeps repeating that 30, 50, a hundred times until everything is done. the biggest mistake that people will make with Ralph is just making the requirements and the instructions very, very long. you're gonna get bad results, for that. So it works best, the dumber you make it, and, the more repetitions you have. So basically, rock wing, I'm just gonna try and do the same task over and over and eventually just gonna get it right. And then it's gonna move on to the next task. He does all of this on its own,
Ryan Carson
Ryan Carson 1:37:49
and I think people think that this is gonna be some runaway
1:37:52
token thing, and it's not like if you specify user story that's small and doable, it's not gonna run away tokens. The second thing I'll say is it's very important during that progress txt write is to include your thread, URL. So Amp does this cool thing where you can actually read previous threads, and I'm sure Claude Code can do this as well. But by putting the thread ID then the future versions of Ralph can go back and actually read previous threads if they're like, how did it do that? Or like, what happened there?
Alex Volkov
Alex Volkov 1:38:20
What choices it made.
1:38:21
Yeah. Yeah. So yeah, I'll say when Claude, when Codex came out, the original Codex, from Chat GPT, I remember Greg Brockman talking about software engineering. In a nutshell, what it is, is we are solving problems by taking a big problem, breaking it down to smaller problems, breaking down to smaller problems, breaking down those smaller problems into user stories and addressing code that can solve a smaller and smaller chunk of problem. This is basically a software engineering in a nutshell, and this seems to be very much aligned with, with that. And the ability, and the innovation here in Ralph specifically is the autonomous part where like, you break it down so much and you do the work ahead of time in thinking what it needs to be done in the final state. That at the end, you can hand it off and it will run without your intervention, better than it would if you were just like one shot it and say, Hey, I want this feature. because that work will happen during the coding, when that work happens. Outside of coding, like with Kanban and, agile software development, the more work you put ahead of time in the sprint planning and in the breaking down to user stories, the higher quality software you would
Ryan Carson
Ryan Carson 1:39:26
Yes.
1:39:26
This is how real work happens. we don't ever say the word one shot. no real work is done. One shot, like all work is done, through user stories. So I think this is what's so exciting. It seems like the whole vibe coding term is starting to die, which I think is important. I'm not vibe coding
Alex Volkov
Alex Volkov 1:39:43
When capacity define this was like he speaks to a model,
1:39:45
accepts whatever it sends and speaks again in whatever it sends. This is like you're doing a lot of pre-work on an actual software development cycle and just hanging off, the implementation work to agents that you know, they can succeed. Ryan, this is, this was great. Thank you so much. I'm very, very happy that, we had you on the show to talk about Ralph and to break down kind of the article that you posted and how it's implemented with amp. you were gracious enough to say it like you don't have to use amp, but you guys should use AMP because it works very, very much. Was Amp used in the original hotly article as well, right? Oh, yeah. So Amp is definitely,
Nisten Tahiraj
Nisten Tahiraj 1:40:18
There was, remember at a time when, Anthropic was complaining
1:40:21
that there were a few subset of users that were just using Opus 4 24 7. Yeah. Yeah, that's, that's what was happening. And we weren't telling people, but there was a meetup with Jeff Hyundai when he just showed up to San Francisco and there was a meetup for, just a bunch of engineers. It was just a Twitter group chat. And, we were all doing like, show and tell of just like, doing orchestration and all this stuff and firing up subagents. And then Jeff just drops the bomb with this and is like, I can fire up 500 subagent. So then we learned about Ralph and then there was a Y Combinator hackathon in which, it was, Dex, Dorothy Simon, I forgot his last name. And Wilson, no, no, it was another Simon and it was also Lyndon Leon. And, that was the first time where they, they just won the hackathon by just using Ralph with Sauna 4.5. And, they won the hackathon by letting it run overnight. So they went to sleep while everybody else was just still coding. And then they had like three or four, of the apps, done. And I think that night, Jeff got very drunk and started the crashing out of them on, on hacking use, which is pretty funny, at the time. But, yeah, this was the first, like when Ralph escaped the lab. won the hackathon from them. I would recommend people look at Dex Dorothy stuff from human layer if they like building, subagent things, because they're kind of too agnostic, You can also run rough with a two B model that was very funny for me when I posted the meme of the African dude just saying, why are you running? And it's like a little rough, on the table. But yeah, it's pretty surprising that it caught on now and, I'm just glad the GPU rate limits are a lot higher. I'm glad people are having fun with it.
Alex Volkov
Alex Volkov 1:42:11
Yeah.
Nisten Tahiraj
Nisten Tahiraj 1:42:11
yeah.
Alex Volkov
Alex Volkov 1:42:13
This is, very much within the harnesses and techniques can take existing
1:42:18
tools, existing intelligence, and make them significantly, significantly better. The more you kind of like think about them ahead of time. So, Ryan, thank you so much for breakdown. Nisten. Thank you for the backstory, folks. Not even if you're not coding, I think this is very important stuff for you to understand where we're going and to understand why we say there's no AI bubble. Because even with the existing techniques, there's so much to still be discovered. Even if all of the labs stop creating new models and Jensen stopped delivering seven x performance for the same, price and power range even all of that stops. We have so much to discover and so much still more to build. Software is required everywhere. Everywhere you look around the software, This is like something that can be solved with more software, and a bunch of other stuff. Existing world needs more software and more software can be done even with everything stopping, but nothing is stopping. These models will keep releasing. We're gonna see GPT six this year. Probably we're gonna see, Opus five maybe. I don't even know what I expect from Opus five or 4.5 We're gonna see a lot of innovation this year and, the progress is not stopping. While also we can take the existing progress and expect a lot more of it. This is what we mean when we say there's no bubble. so with that said. I think this is a great point to end the show, after two and two and something hours here. Already the time is passing.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:43:34
quick.
1:43:34
the video model and the avatar forcing, right?
Alex Volkov
Alex Volkov 1:43:38
avatar forcing, unfortunately,
1:43:39
will not gonna get to today. the video model is L-T-X-L-T-X, we kind of mentioned that before. Now the, the most important news is, the, the, the video model is open source and that one is does audio and lip-syncing as well. So LTX is now the top open source model for you generating, things, and it's fine tuneable as well. So you can Finetune this, which means create different things. The open source community is very excited about this. but yes, it's time to end the show. with the first show of this year, I think we did an a, a great job. Ryan Carson, a host and a guest. I think for the first time we had a host and a guest, combo. we dive into techniques. I really wanna talk about the gentech skills because you mentioned skills and I've been using skills for a while. And like with MCP, we would love to bring you, the new and updated skills that also you don't necessarily have to use code for skills. You can just create skills, you can ask load, et cetera. So, we'll cover, Gentech skills, probably in the next episode. With that, thank you guys so much for joining for the first episode of the year. Over 1700 of you joined and watched, us talk about everything from Nvidia CS announcements to the new coding phenomenon that lets agents run asynchronously and build fis for you by doing the work ahead of time with Ralph, to, OpenAI health and health importance in, in AI agent decoding and to a bunch of open source as well. Always so much fun to do these live shows that I gotta admit, after the two and a half weeks, it's really, really good to be back, with y'all on January 8th of this year. I believe it's gonna be incredible year and, we're gonna have so much to cover live. for now, we're gonna lend the show here. Thank you so much for tuning in. If you missed any part of the show, you can find us everywhere where you get your podcast. We have a 4.9 rating on Apple. Please give us five stars reviews. we all now have microphones, which is amazing. we've missed our friend yam over here. But generally, everybody's here. Ryan Carson, building residence at amp, and incredible builder and explainer. So please follow him for everything like breakdowns, Wolfram Raven Wolf now, and AI evangelists with Weights, & Biases like me, and, focusing on evals. We're gonna bring you a bunch of evaluations on the show, Nisten, AI engineer@bagel.com, and also our resident, medical, EE expertise person. You wanted to mention something super. Well, there
Nisten Tahiraj
Nisten Tahiraj 1:45:47
was a, there was a Dr. Ralph that posted already today.
1:45:50
It's amazing.
Alex Volkov
Alex Volkov 1:45:51
LDJ, our resident, data scientist, machine learning
1:45:53
engineer, and person who can explain how the models are built and built. A bunch of data setss. They're used still by open source models and obviously we're missing, another part of us, which is Yam Peleg as well. All of us are here to discuss this week's news, to bring you this week's news to learn ourselves as well. basically to make sure that nothing in this crazy AI updates world gets missed. Hopefully we did our job well. If you enjoyed any part of the show, please give us a five star, share it with a friend, subscribe, and we'll see you here next week. Thank you so much everyone. Bye-bye. We'll see you next week.