Episode Summary

From Weights & Biases, some weeks we get 2 huge model drops, plus an interview with the awesome Kevin Hou from Windsurf about AI coding, IDEs, agents & future of software development This episode covers Exciting Updates from Entropic, Breaking News: GPT-4.5 Release, Open Source AI Highlights, Grok's Unhinged Voice Mode: A New Experience, and Interview with Kevin: The Journey of Windsurf.

By The Numbers

Episode Length
100 min
Runtime captured from the cached podcast RSS metadata.
Show-note Links
16
Curated links preserved from the cached Substack post.
Featured Speakers
4
Known host, co-hosts, and guests surfaced on the episode page.
Chapter Highlights
5
Major sections summarized from the exported Descript markdown.

๐Ÿ”ฅ Breaking During The Show

Exciting Updates from Entropic
**Alex Volkov:** This week was Incredible, specifically because there's an update from a company that we've been waiting for a while to give us an update, which is Entropic. We're definitely going to cover Entropic 3.
Breaking News: GPT-4.5 Release
**Alex Volkov:** In the world that we got this week, we've also just now gotten some breaking news, which I'm really struggling whether or not to just press my breaking news button here. because the news is going to break very soon, around eight 30, my time of seven 30 in the morning of, some of this go open the eyes, main account posted, see you in the live stream at 4.
Open Source AI Highlights
**Alex Volkov:** We're on live on YouTube, LinkedIn, X video, et cetera as well. So let's start with open source.

๐Ÿ“ฐ Exciting Updates from Entropic

**Alex Volkov:** This week was Incredible, specifically because there's an update from a company that we've been waiting for a while to give us an update, which is Entropic. We're definitely going to cover Entropic 3.

  • **Alex Volkov:** This week was Incredible, specifically because there's an update from a company that we've been waiting for a while to give us an update, which is Entropic.
  • We're definitely going to cover Entropic 3.

๐Ÿ“ฐ Breaking News: GPT-4.5 Release

**Alex Volkov:** In the world that we got this week, we've also just now gotten some breaking news, which I'm really struggling whether or not to just press my breaking news button here. because the news is going to break very soon, around eight 30, my time of seven 30 in the morning of, some of this go open the eyes, main account posted, see you in the live stream at 4.

  • **Alex Volkov:** In the world that we got this week, we've also just now gotten some breaking news, which I'm really struggling whether or not to just press my breaking news button here.
  • because the news is going to break very soon, around eight 30, my time of seven 30 in the morning of, some of this go open the eyes, main account posted, see you in the live stream at 4.

๐Ÿ”“ Open Source AI Highlights

**Alex Volkov:** We're on live on YouTube, LinkedIn, X video, et cetera as well. So let's start with open source.

  • **Alex Volkov:** We're on live on YouTube, LinkedIn, X video, et cetera as well.
  • So let's start with open source.

๐ŸŽง Grok's Unhinged Voice Mode: A New Experience

**Alex Volkov:** Grok released voice mode. A week ago when Grok 3 was released, they only released the better preview, and they didn't release voice mode.

  • **Alex Volkov:** Grok released voice mode.
  • A week ago when Grok 3 was released, they only released the better preview, and they didn't release voice mode.

๐ŸŽ™๏ธ Interview with Kevin: The Journey of Windsurf

**Alex Volkov:** et's officially jump into our, interview portion with Kevin. Kevin, this is your first time on the pod.

  • **Alex Volkov:** et's officially jump into our, interview portion with Kevin.
  • Kevin, this is your first time on the pod.

Hey all, Alex here ๐Ÿ‘‹

What can I say, the weeks are getting busier , and this is one of those "crazy full" weeks in AI. As we were about to start recording, OpenAI teased GPT 4.5 live stream, and we already had a very busy show lined up (Claude 3.7 vibes are immaculate, Grok got an unhinged voice mode) and I had an interview with Kevin Hou from Windsurf scheduled! Let's dive in!

OpenAI has finally shipped their next .5 model, which is 10x scale from the previous model. We didn't cover this on the podcast but did watch the OpenAI live stream together after the podcast concluded.

A very interesting .5 release from OpenAI, where even Sam Altman says "this model won't crush on benchmarks" and is not the most frontier model, but is OpenAI's LARGEST model by far (folks are speculating 10+ Trillions of parameters)

After 2 years of smaller models and distillations, we finally got a new BIG model, that shows scaling laws proper, and while on some benchmarks it won't compete against reasoning models, this model will absolutely fuel a huge increase in capabilities even for reasoners, once o-series models will be trained on top of this.

Here's a summary of the announcement and quick vibes recap (from folks who had access to it before)

Alex Volkov
Alex Volkov 0:00
Welcome everyone to Thursday.
0:01
Let's get started.
0:35
Welcome everyone to Thursday I, February 27th. My name is Ax Ov and am my AI evangelist with weights and biases I apologize in advance. I've got a little bit of cold after all my travels, but Supposedly it sounds okay. So hopefully we'll get through this. I'm all medicated up Welcome everyone. This week was Incredible, specifically because there's an update from a company that we've been waiting for a while to give us an update, which is Entropic. We're definitely going to cover Entropic 3. 7, Cloud 3. 7. for those of you who have no idea what I'm talking about, enjoy the ride, because this is likely the best coding model or generally one of the best models. In the world that we got this week, we've also just now gotten some breaking news, which I'm really struggling whether or not to just press my breaking news button here. because the news is going to break very soon, around eight 30, my time of seven 30 in the morning of, some of this go open the eyes, main account posted, see you in the live stream at 4. 5 hours. Which absolutely means the GPT 4. 5 is going to get released. Of course, it's getting released on a Thursday folks. The reason why Thursday is called Thursday. I, because two years ago, almost to the dot, like two years ago, minus two weeks, we're going to actually celebrate our birthday two weeks in two weeks. GPT four was released. So it makes perfect sense that on a Thursday, and that was a Thursday, by the way. So it makes perfect sense that on a Thursday, we'll see a new GPT 4. 5. We've been waiting for this. It's going to be a unified model. We don't have any details yet, but, we'll definitely do a follow up stream. So unfortunately we will not stay live until then. but I likely will do like a emergency stream after they announced because. Hopefully they will give us something and not only just a blog post and announcement and coming soon. But I've seen leaks already from our friends. Breaking news and Tibor Blaho. What's up Tibor? I've seen leaks already that the app was upgraded and supposedly pro tier members, those who pay 200 bucks, they will get access to at least the trial of GPT 4. 5, whatever that is. Likely that is O3, we've been talking about O3, Sam Altman said O3 mini will come by the end of January, and they launched it on January 31st, and then the O3 will come later, then they announced a roadmap update. So after this roadmap update, they said the O3 will not release as a separate model, it will become part of GPT 4. 5. Now with that, there's also a bunch of other news. We have so many. Updates. There is Grok launched voice mode. Let me actually do a TLDR. maybe this is what it's time for the TLDR. We've seen some folks already stepping into the space. And while we wait for other co hosts, I'll just tell you about everything that we're going to talk about this stream. I will also shout out the fact that today we're going to have an interview with Windsurf. I want to see a participation from folks in the audience. How many of you actually used Windsurf? Could you give me a comment or give me like a thumbs up? For those of you who are watching the stream, I'm just going to add the folks who are participating so you can see the reactions. Anybody here use Windsurf instead of Coursera? So I went to AI Engineer Summit this past week. And in there, we met with folks from Windsurf, with Kevin specifically. And I got to say that I was very interested in what's going on with Windsurf. And then Kevin said, he's going to come and tell us all about it. So stick around for the second part of the show where Kevin Hu from Windsurf is going to join and talk to us about how they got an insane amount of traction in the past, just few months, and they're getting more and more, and at least on my feeds. This, absolutely changes the narrative. I've seen people like dropping Cursor and moving to Windsurf. For those of you who, for whom this makes no, sense, Windsurf and Cursor are both like IDE editors. They're cloned VS code and, Windsurf just the upcoming one, but they used to be called Codium before. And so they just rebranded with a new IDE, but they've been doing this for a while. So an interview with Kevin coming up later, and I'm very, happy to. Welcome my co hosts here, Niston, what's up?
Nisten
Nisten 5:04
Hey, what's up everybody?
5:05
Hey
Alex Volkov
Alex Volkov 5:06
man, we have, let's see, let's do a quick recap.
5:08
We have around a hundred or so folks already listening and probably going to have a lot more because today is a great release day. have you been, have you tried windsurf? We both said we should try windsurf before Kevin comes. Have you? I haven't.
Nisten
Nisten 5:21
I downloaded it.
Alex Volkov
Alex Volkov 5:23
Awesome.
Nisten
Nisten 5:24
I watched the whole video with them and,
Alex Volkov
Alex Volkov 5:27
So we'll tap into the community here, folks.
5:29
If you have any questions for Kevin from Windsurf, feel free to drop them. We're going to, we're going to chat with Kevin. I pretty much have a great idea of what to ask. And I've seen Kevin's talk on AI engineer. Like we're going to talk about coding and vibe coding and et cetera. But if you have any questions specifically for Windsurf, I would love for you to just drop them and tell us. what's going on? all right folks, so I think that it's time for us to get started. I am getting some Feedback that the audio is not that great. So I would love for you, for those of you who are mutuals to DM me Oh, or maybe you just say,
Nisten
Nisten 6:05
I can't hear myself.
Alex Volkov
Alex Volkov 6:07
There's a second.
Nisten
Nisten 6:08
Yes,
Alex Volkov
Alex Volkov 6:10
my bad.
6:10
Give me a sec before we get started. I always
Nisten
Nisten 6:13
have.
6:14
I hope it didn't leave you a bit sick after you left.
Alex Volkov
Alex Volkov 6:18
Yeah, I think it should now should be better.
Nisten
Nisten 6:22
Yeah.
6:22
Yeah. It's better.
Alex Volkov
Alex Volkov 6:23
Okay.
6:24
Thank you folks for commenting. as always want to fix this before we start the show. All right. I think it's time for the TLDR. Awesome. Folks are confirming. Thank you for, telling us that the ID is better and, we can start with TLDR. Folks, for those of you who are joining for the first time, Thursday is getting recorded live and TLDR because there's a lot for us to talk about. so if you live in any, anywhere in the middle, you can then know everything we're going to talk about. But also, for those of you who don't know, this is going to end up as a podcast on every major podcast platform, wherever you get your podcasts, Thursday Eye, give us a review and also, a newsletter with all the links. So if you fear of missing anything, please have no FOMO, just follow ThursdayEye. News and you'll have everything. Let's start with the TLDR. I'm going to share my screen super quick for those of you who are watching. And also for those of you who are on space, I want to watch us. We're on live on YouTube, LinkedIn, X video, et cetera as well. So let's start with open source. this week, actually yesterday was very, interesting. This thing, did you see, Microsoft three is five. 4. 5, I believe they called it, I think he called it four multimodal. So
Nisten
Nisten 7:37
yeah, very, the five models can be.
7:41
Can be hit or miss, like they can seem pretty bad at first and then do pretty great in, in benchmarks. but yeah, this is the, this is a new model and, I'm still reserving judgment until, until I tried it, but it looks ideal for, for a portable size that you can run on the phone and it's multimodal. And it's MIT license. So
Alex Volkov
Alex Volkov 8:02
yeah.
8:03
So we'll get to talk about this once we get to open source, but super quick. I saw one release from a together, they called it minions, which is they're pairing local models, like Olama with the cloud models for like speed up, that's very interesting. Worth checking and talk about because we'll always talk about. Edge models and on device models. But this is like combining both. It's very interesting. Then DeepSeq, our friends at DeepSeq went on the open source pre and dropped like tons of open source. Everything is very advanced, like flash MLA and deep gem and deep EP. And I think the highest one is a parallelism strategies, specifically dual pipe. So worth checking, talking about this. This is like very advanced stuff. This is a part of the reason why deep six succeeded. They released kind of some of the secret sauce behind this. So we'll run through this. I don't think we're going to do a deep dive into deep six open source, but they released like a bunch of stuff that they attributed in the paper, the success for their models and training. Then yesterday, Microsoft released five, four multimodal. And 5. 4 mini, which is 3. 8 billion parameters, 5. 4 multi model. And when I say 5, for folks who are just listening, it's P H I, 5. 4 multi model. This model processes text, image, and audio inputs, and generates text outputs. 128 token context layers. It beats Whisper on transcription. This is crazy to me. obviously, Whisper is a much smaller model, as Technium called out. Whisper is significantly smaller model, but this is an LLM. This is not a, specific audio encoding model. This is an LLM that beats Whisper in its own game with four point, 6. 1 word error rate. It does recognition, very impressive. It does translation. It's very impressive for a small model. because it's not a specialized model. Whisper 3 is specialized. and so we're going to cover PHY a little bit as well, because Nistan, I agree with you with the vibes on PHY. They're always split. Then in open source, we got two new things this week that I was very excited about. I wanted to chat with you in and folks in the crowd as well. we started to see diffusion based LLMs, and we got two of them in the same week. so first of all, inception Labs launched Mercury Coder, which is diffusion based LLM, they said it is the first one, and then almost immediately after lada, L-L-A-D-A 8 billion parameter also released a diffusion based lm and also says that like they improve on top of LAMA base, et cetera. And, for those of you who don't follow the space too much, we're going to talk about the difference between diffusion based and, autoregression based MLMs. Diffusion is mostly used for image and video generation. and it's very interesting. They actually visualize and it's worth looking at. They visualize how kind of blocks of text appear. And we're going to talk about the benefits as well.
Nisten
Nisten 10:45
Yeah, this is a complete.
10:47
Breakthrough and, it just hasn't quite hit yet that this just happened because people thought for a while it should be possible because then you can do, you can do multiple token prediction at once, but, yeah, we can talk more about it. I actually spent a few hours on it yesterday. looking through all, the papers, the lot of model, trying out the inception one, looking at what its weaknesses were. So I can, yeah, I can cover that.
Alex Volkov
Alex Volkov 11:17
That's awesome.
11:17
Yeah. we'll dive into this. so this is it on the open source LLMs, but there's a tons of other open source that was released in voice and audio, or I guess in, in vision. Alibaba open sourced 1x, which is the video model behind what you saw in, sorry, in Quen's interface. And I've been talking about this for a while, but they finally open sourced this one. It's a text to video and image to video state of the art model. So this is likely the best video model that we currently have. Shout out to Alibaba 1x team. They actually have a new handle. So we know the Quen team. This is another team I'm assuming, and they just like collaborate. it's been up for a while, so they have a full service. but now they also have a, they released the weights for these models and you can run them. there's also some audio stuff from ByteChuan. ByteChuan is also a Chinese company. they released the ByteChuan audio unified framework for end to end speech interaction, which is also super, super cool. so this would be everything on the open source and anything else I missed folks in the audience. Also feel free to tell me if I missed any like big open source stuff.
Nisten
Nisten 12:23
No, that's a, that looks pretty solid.
12:25
maybe I just wanted to cover the diffusion
Alex Volkov
Alex Volkov 12:28
models
Nisten
Nisten 12:31
that are here before we, before 4.
12:33
5. It drops. Yeah. And then everyone forgets about it.
Alex Volkov
Alex Volkov 12:37
That's true.
12:37
in the area of big companies, lms, like we're moving forward, we're still in the TLDR area of big companies. Lms, we have a lot to cover. So obviously opening I is about to launch GPT-4 0.5. We've seen screenshots of the app that shows that pro users will get 4.5. We know from previously some altman's releasing the roadmap for Open the Eye that there's going to be a unified model. Likely this is what O three is. And then we're going to get some sort of like a unified model. but the big news this week, and we're going to cover this a lot is Entropic Release Clock 3. 7 and Reasoning. This model we've talked about as, on the space already, for those of you who are joining because of this space, welcome, we've covered this model at Nozium at this point, but not on ThorsDI. This is the best coding model around. It beats previous, Cloud Summit 3. 5. the vibes on this are immaculate. We're going to cover them as well. Open the also launch deep research to plus members and the new members folks if you haven't tried deep research because you didn't want to pay 200 bucks now is your chance try it's incredible you get the limited amount but it's worth for everything that you're trying to purchase etc it's so so good definitely worth it also in big companies, APIs, we're going to cover this Alexa plus is about to launch the next generation of Alexa powered by Claude and Nova and some other AIs, you'd be able to talk to Alexa and she's not going to be. Let's face it, it's probably, she's probably gonna be stupid anyway, but she's not gonna be as stupid as previously because you'd be able to talk to her like you talk to an LLM, like you talk to her with voice mode. And so Alexa should be able to connect some dots for you and do some things. Although I'm skeptical about some of the stuff that marketing showed us, but all right, we'll talk about this. And then Grok 3 goes free. when we talked about Grok 3 last week, after that, they released Grok 3 for free for everyone. Because they screwed up a little bit, and for many people they showed Grok 2 instead of Grok 3. Not only this, they also launched voice mode, including unhinged 18 plus voice mode. Which we will definitely play on the show, because this is just something that, only XAI could launch. No, no other big labs can launch something as unhinged. And call it unhinged in the app, it's literally called unhinged. I don't know if you saw. so the conversations with this was just like remarkable. We're definitely going to play, we're going to try live to talk. I
Nisten
Nisten 14:56
just got it today before the show that finally enabled it.
14:58
So I haven't tried it. Oh, there, there is one more thing that we missed in the open source world, and that is magma eight B from Microsoft. this is. This is huge, and it looks like something is going to change people's lives. It's a multi modal agent, but what that means is that it's, I think it's been trained for robotic tasks. So not only can it do vision and LLM, it can do vision and LLM for, calling actions and specifically calling robotic actions. So this to me looks like the robotics. LLM and, I, yeah, I don't, I want to hype it a little bit because again, it is, it is also MIT license. I think, no, it's, yeah, it's Apache 2. 0 license. if I'm correct and, yeah, it's MIT Magma 8B on Microsoft on Hugging Face for everyone to check
Alex Volkov
Alex Volkov 15:52
out.
15:52
Definitely let's add this as well. folks, there's so much news that, it's impossible for us to catch everything. But, with the community and everything, we're going to get it close. we're still on the TLDR, so I want to just finish up. I'm going to add Magma 8B as well. In the, I think this is it in big companies and APIs 5 and CloudSonic 3. 7. I think that's enough on its own. Grok voice mode is also here and Alexa plus. Google also launched AI co scientist, which is pretty dope, but we can skip it. In this week's buzz, an area where I talk about, weights and biases, we're going to cover the agents course that's coming up and also do hopefully a recap of AI engineer. And vision and video. Let's finish up the TLDR. I want to talk about open source in vision and video. VO two from Google is finally available via A-P-I-V-F-L and only text to video is available right now, but a image to video is going to be also available. VO two is likely the state of the art. Video model, that's coming from Google. It's probably the best one. I also covered the Alibaba open source 1x, which is the state of the art for open source video models. But VO2 is definitely up there. It's the most expensive one. Better than Sora. Significantly better than Sora. in voice and audio, there's two things that I do want to cover. Octave released Hume. Sorry, Hume, the company released Octave, which is their first language model built for TTS, for text to speech. This is a, an LLM that you can ask it to generate voices and talk like a human. It understands what you ask. It understands the prompts and actually responds with emotions. It's pretty cool. And you can also prompt voices. You can say, I want a voice that's raspy. I want a voice that sounds like Alex. With a mix of Israeli and Eastern European Ukrainian accent that is sick and it will generate something also 11 labs introduces scribe folks, state of the art ASR model, beating whisper three, beating other ones and does diarization. Woo. This is so cool. So shout out to 11 labs. I wish they released this as. Open source. Obviously they didn't, it's in their API, but they're also planning. They're bidding Gemini 2. 0 flash. They're bidding whisper large with 85 percent of user preferences. they have audio event tags. When we laugh, when we pause, when we sneeze, when we do all these things, this model understands the audio, has 97 percent accuracy in English with 3 percent error, word error rate. It has world level timestamps built in and speaker diarization. This is so cool. This is absolutely so cool. So definitely we'll want to try it out. Ah, and the last thing that we have to cover is Cloud Code. Together with the 3. 7 release, Antropic released a CLI library, like a command interface library that we've tried, all of us. And it's very much dope. You go into any folder, you type Cloud, it opens up a thing. And this thing understands your, Code it as it's like an agent that runs, it can create files. It can run the things that you change things. super cool. We're actually going to show this.
Nisten
Nisten 18:46
I think they should call it cloud cash because it
18:50
just burns money as you use it. We saw people use like 18 an hour that we have the space up and someone went up to 30 by the end of the space, just burning tokens. Some people have put it on their company thing and I saw LDJNR who wrote the Shrek sampler, he put four of them and then he put one, one last one to actually Pull together all the, to do do all the code pull requests. So he was running five of them all at once. And yeah.
Alex Volkov
Alex Volkov 19:23
One last thing from the community.
19:24
Welcome to the DJ, by the way, one last thing from the community. If folks just commented that life code bench, which is a benchmark for coding that we trust a lot, has a new leader, Kimi. K 1. 6. and yeah, we definitely should mention this as well. Thank you. Oh
Nisten
Nisten 19:39
yeah.
19:40
We got it. We got to cover Kimi. We have covered Kimi before, but, there is something in particular for the GRPO, data set that, It needs to be talked about. All right. So let's talk about this and
Alex Volkov
Alex Volkov 19:51
let's start with open source.
20:10
Open source AI. Let's get it started. That's let's get it started with open source folks. We have tons of stuff to talk about, so let's jump into the main categories by covering the fairly quick things. super quick together as minions is a attempt to use local open source models together with clouds for speed up. So they're basically using. Very interesting way to break down tasks and then offload those tasks locally to your multiple, multiple local elements around on your machine. So for example, remote GPT for all, for example, can decide what your local minions and the, they focus on the S it's not minion, it's minions, multiple models that are running in parallel on your machine. They're achieving incredible, 87 percent of the accuracy of the small, the production model, which is 3. 3 percent of the cost because most of it runs locally. So most of the tokens generate locally. It's honestly pretty cool. We're into the blog of together, and they have, created multiple, interesting things. This one, just like another attempt to, to shove off some compute. Two small local LLMs as specifically, they have issues with context length as well. but it's very worth checking out. If you want to optimize kind of the stuff that runs locally. to me, this beats a little bit, this kind of, removes the excitement about local models, because one of the reasons why we use them is fully complete offline use, and this kind of reduces this, and so the cost optimization here. it's not necessarily, that important because, again, LLM goes towards too cheap to meter. But I think nonetheless, this is a cool approach that we want to highlight because we do love LLM. Open models locally. but I think the two main things that we want to cover in open source is 5. 4 multimodal and the diffusion models. Listen, let's start with diffusion models because I think we had two this week and it's definitely very interesting. Mercury Coder launched and the thing that I saw the most, the highest kind of outcome of diffusion models is The speed with which it generates the text on H 100, they're getting a thousand tokens per second. Something we only saw from specialized chips for LLMs, this is on an H 200, H 100 getting a thousand tokens per second on HH 100. Mr. What is diffusion models and how is LLM related to this?
Nisten
Nisten 22:38
Yeah, this came as a bit of a shock yesterday because people
22:43
had been talking for a while that you should be able to use diffusion models with code and how to grow the. the code base when you generate it, that's not exactly right. And, the fact that it worked and the fact that it worked very well was, was a bit shocking yesterday. I did try the model. It, it looks like it is a fairly small model. So they're comparing with when, when coders, seven B, which is a very good one. So it was actually. Pretty good. it did, it only got like half of the heart, very hard questions, but it was actually, it was doing the hard question and the speed was real. So the speed was not just average speed of a batch of stuff, but everyone that was using it was getting over a thousand tokens per second. So it was fast and it was just fine. Now, the reason that these models work well is they have paired the. LLM with the diffusion model. And at first, me and LDJ lj, feel free to jump in. We were debating as to how they would actually do that. We thought maybe they can change like the expert weights with the diffusion weight and have some kind of hybrid that way. But no, it looks like the LLM and the diffusion part are separate and the way that they run is, the LLM runs and then it has some kind of, masking mechanism. And then it has a bi directional inference, so it goes not just in one way, but another. anyway, there's an LLM in there, and then a diffuser model in there. And, it allows it to generate multiple tokens per second. And, this makes it, this makes it very good at stuff like fill in the middle. Because if you give it the outside of the code or so. It, it's very good at filling in the missing parts because that's also how it has been trained. And this also allows it to. Parallelize really well.
Alex Volkov
Alex Volkov 24:45
And if you play with this on the chat interface of chat.
24:49
inceptionlabs. ai, they actually have this toggle that's called the fusion effect. You can actually see this whole block come together and they're not streaming token by token, like autoregressive. They're actually like trying to generate like whole blocks. Although I saw some conversation about this, Nissen, on
Nisten
Nisten 25:03
Yeah, the streaming effect is fake there.
25:05
I think the API is still returning.
LDJ
LDJ 25:09
I think they call it like a diffusion effect for a reason.
25:12
Cause I think they're being somewhat transparent. Like it's not, that's not actually how it looks, but if you turn it off, like you still see, if you try again, like a regenerate, but with the diffusion effect off, you'll still see it does chunk by chunk, like it almost looks like it's streaming because you do get something at the beginning before the end, but there's still like a. Like you see a first chunk of text at first. if you were to like, put it super slow mo, it'd be like a chunk and then another big chunk and then, and so forth.
Alex Volkov
Alex Volkov 25:39
right now, of course, the live demo effect is showing
25:42
that it doesn't bring me anything. but yeah, before this, before the fusion effect, you can do actually see streaming. But they are like with the fusion, they're generating like all of these. And then you add in the code blocks. this is what makes it run much faster. And, Why haven't we seen diffusion models before? I wanna hear from you folks if we're actually getting excited about this or this is like another novel something, thing that eventually will not end up because I'm assuming big labs already tried like most of these approaches.
LDJ
LDJ 26:09
Yeah.
26:09
So there is papers in the past that there is like a somewhat steady stream over time of kind of more and more promising, diffusion papers, like even an open source and just like on archive and stuff. And actually some of the authors around, like some of the most promising papers over the past year are people that have, that worked on this company that ended up joining this company. And also Apple has something called planner, which showed some promising results that was like 1 billion parameter scale that was competing with autoregressive GPT 2 scale models, trained on similar token counts and it shows like lower hallucination rates, like better self error correction and all that. But I think the question has for a long time been how well does the scale and. Is it like training compute, effective from a train compute standpoint and then inference compute as well, can you make all those things like actually be efficient enough, even if it's as good quality for the same parameter count? and It seems like here, at least what they're claiming is they're able to, through their architecture, have a two times more parameter count while still being same cost and latency as a regular autoregressive model, which is pretty impressive.
Alex Volkov
Alex Volkov 27:21
Yeah.
27:22
And the speed obviously is very impressive. We're getting incredible speeds of, unmatched In regular GPUs, right? We're only seeing the speeds from specialized hardware like LPUs from Grok and, Cerebrus, et cetera, a thousand tokens per second is absolutely mind blowing to get, especially in the coding, interface. So in this interface, the folks, who are listening, we're trying out. It actually have a sidebar with HTML outputs, and then you can ask for the model to do something. It almost instantly appears, almost immediately, so that's super, super cool.
Nisten
Nisten 27:53
okay.
27:53
What's, what's pretty amazing is it got very good results in, Copilot Arena. So while It might not be able to compete all that well, just because of the size and how much they put into it because it's so good. It's still in the middle. it scores well above its weight in the copilot arena, because again, when you're using an incursor or encode. You're always doing, fill it, fill in the middle. And yeah, so it's scored very high on that. And, so it's performance per weight, considered that very high. so that's what I find by itself. It's all right. Like it's not bad. yeah, that's I wonder if I
Alex Volkov
Alex Volkov 28:35
wonder because of this, the significant speed up
28:37
in tokens and size, if this means a good thing for local models. And so let's talk about actually the second thing, also in diffusion, which is we are not only seeing them, release a new model. We're also seeing like an open source attempt as well. So we have Lada, which is, Yeah. Yeah. Yeah. LA, 8 billion parameter, also a deficient limb. just released a paper and dropped the model as well. what do we think about laa folks?
Nisten
Nisten 29:04
It looks like
LDJ
LDJ 29:04
lava.
29:05
Oh, lada. no. Lada. Yeah. Yeah. So I think that came out about a month ago or so, or a few weeks ago. Yeah. so I think.
Alex Volkov
Alex Volkov 29:13
Yeah, a lot came out a few weeks ago, but we went, under
29:16
discussed, but now that we're using some kind of diffusion models, it's worth bringing it as well.
LDJ
LDJ 29:20
Yeah, so like Aaron said here, it's trained on like around five
29:23
times or seven times less data while already like competing with LLAMA3 AP with same parameter count, but trained on five, seven times less data. So that's like in itself is already pretty impressive, but then there's a question. Okay. Even though it used five to seven times less. Data during training, how much compute did that training actually cost though? Because for each token, it might've used much more compute during training. And so there's different factors like that to take into account. I'm not sure about the specifics for Lata and how that played out, but I think it was comparable or better flops for training as well, if I remember it, but there is like, when I actually tried it myself, there is these like. robustness failures and a lot of like unreliabilities that could also just be due to their like poor post training or something. It also just be bad fine tuning. But, yeah, there's a hugging face demo that people could use for free if they want and, see how it works for themselves. Yeah.
Nisten
Nisten 30:21
I agree.
30:22
A lot of needs more training, but it looks like. This might be pretty much the same thing. yeah, earth shattering breakthrough dropped and it turns out someone's already open sourced it. yeah, we're likely to see some models come out this way. very likely to see more models, come up this way. And, yeah, there are quite a few questions in there. How long it takes to train this stuff. It might just not have been, trained for that long. It's a, it's more of a demo and the demo worked.
Alex Volkov
Alex Volkov 30:53
Yep.
30:53
So folks, the fusion models absolutely are. It's going to be a very interesting, improvement or potentially at least for speed, not to be ignored. so we'll keep mentioning whether or not the models that we get are like diffusion based or et cetera. I find it really funny that in the image world where diffusion models predominantly work, they have started moving towards diffusion transformers. We're a mix. And so now the language models are moving towards diffusion. All right. Let's talk about some other open source folks. we have also, let's talk about anything you want to cover. You want it to cover Kimmy. let's talk about Kimmy because I don't think we've covered it fully. I'm not entirely, to have, I'm not entirely sure like what Kim is all about, but I saw that Kimmy is like beating the life cold bench, which is dope. Kimmy, we mentioned, but I think those are fine tones of Kimmy that are getting significant performance now, right?
Nisten
Nisten 31:43
Yeah.
31:44
Yeah. It looks like, correct me if I'm wrong, that Kimmy is made from a few people that used to be at DeepSeek or I'm not too sure. I haven't
LDJ
LDJ 31:53
heard about that, but maybe it's true.
31:54
Yeah.
Nisten
Nisten 31:55
or that was another theme, but, Yeah, it looks like so the method,
32:00
the reinforcement learning method, which, you can try it yourself on the onslaught, call that notebook is you take a regular benchmark that has a, b, c, d, and then you start to, to run through the questions and the way you train reasoning into a model. You just ask it. Hey. Just reason stuff out about this question and then come up with the answer. And, whenever in the end it comes up with the right answer, you take that and you put that in the training data. And you keep doing this over and over again. the thing about this is that you can just keep doing this. And they have kept doing this with the 1. 5b model. It's essentially the model is improving itself this way if, people that like to talk about that. But, yeah, it's a 1.5 B that's, that's scoring extremely high and they haven't released a model. They have released an MLE, which is, 13 BMOE with 2.6 active parameters. but, yeah, it looks like a. At larger scales, this method of doing reinforcement learning applies very well to code because you can check if it's got the right code answer, and then you can, You can keep repeating that and then it trains that into itself. And
LDJ
LDJ 33:10
yeah, I feel bad for Kimmy actually, because I think if I
33:13
remember right there, the Kimmy K 1. 5 paper, I think I remember it came out like the same day or like the day after R1 came out and so like it got overshadowed by R1, but yeah, I was saying since day one, when I read the Kimmy K 1. 5 paper, I think it's, it goes even more in depth into how they do. reinforcement learning process and everything than the DeepSeek R1 paper does. And so a lot of people have been looking to what Kimmy is doing for more insight into the next paradigm of RL. And the new model that's scoring, I forgot what leaderboard it is, but scoring like when number one or second place in a big leaderboard now that's their, model fine tuned for it's a international Olympiad for informatics. It says Kimmy K I O I, which that's what it stands for. It's
Alex Volkov
Alex Volkov 34:02
on life cold bench.
34:03
I think it's getting, yeah, give me 1. 6. I don't know why they call it 1. 6. Yeah. IOI.
LDJ
LDJ 34:09
Yeah.
Alex Volkov
Alex Volkov 34:11
All right, folks.
34:11
I think, Mr. Magma, you wanted to mention Magma as well. Magma 8B. What's interesting to you about this model? This is Microsoft, right?
Nisten
Nisten 34:17
yeah, this is the model for robotics and then it's a MIT license.
34:22
It seems like it's similar sort of to OmniParser, just the projection layer with a vision model and stuff. That's a little bit better at video, but it looks like it can do function calling for these robotic tasks. So to me, it looks like an OmniParser 2, which is insanely good. with function calling. And, yeah, if you're going to do anything in robotics, this is the model right now. I don't know what else there is to,
Alex Volkov
Alex Volkov 34:53
and they released it with the great unfair, but I think, yeah, if
Nisten
Nisten 34:57
you're doing, if you're doing a robotics company.
35:00
you're probably going to use it. Probably going to spend quite a bit of time on this. So Magma stands for
Alex Volkov
Alex Volkov 35:06
multi modal agentic model at Microsoft.
35:11
So Microsoft research, and they got to like number one, Hacker News as well. and also, so github. com slash Microsoft slash Magma was just a few days ago.
LDJ
LDJ 35:21
Do we know if there's any, or do you know if there's any
35:24
leaderboards or like scores on any agentic benchmarks or anything else?
Nisten
Nisten 35:28
no, we tried to make, a skunkworks robotics arena last year.
35:33
It's just a bit of ahead of its, ahead of its time, but, yeah, if you pay attention, the, the figure model that they used for, for their, their humanoids, that was also a seven, that was a seven B, there's an eight B, so this kind of looks like. That's a sweet spot over there, which you might have enough compute power. Oh, is that confirmed?
LDJ
LDJ 35:53
Figure confirmed.
Nisten
Nisten 35:55
they show seven B on their site.
Kevin Hou
Kevin Hou 35:58
So yeah.
Nisten
Nisten 35:58
Yeah.
35:59
and then they have a larger one, which we don't know what it is. so yeah, if you're going to make a human node or something, you'll probably get like a 40, 90 or a, a Mac mini or something and just shove it in there. And then use one of these and just, hope for the best.
Alex Volkov
Alex Volkov 36:18
All right, folks.
36:18
So this is Magma. And then we talked about Kimia. We pretty much covered all of the open source, at least in the LLM part. we also probably should mention in the open source as well, the Alibaba open source 1x, 1 W A N 1x. And we've talked about this model in a minute, a while ago, because When you used to go to chat quen, you used to be able to generate video out of nowhere. And apparently, when folks tried to figure out, and I also tried to figure out, what is this video model? They didn't use anybody else's. they created their own, they trained their own video model. It's called 1x. it now looks like Getting to be at least among the open source models like, , HY video, and others. This looks to be like the state of the art, video model. they, this is a similar, not similar. the logo is similar to Alibaba Quinn, but this is Alibaba one. So they have, they have their own kind of, public team. so one, 2.1 is the model. Next evolution video generation. They say they top the leaderboards for Vbe leaderboard, outperforming some state of the art. So they claim they outperformed SOA at, 84%, 84.2. They're getting 84.7. I actually don't see VO on this benchmark because it feels to me the video is gonna be a little bit better. but they're saying this new version of one version, tops this leaderboard. The leaderboard is preferential. It's the users deciding whether or not they prefer it. what else? There is two sizes. There is 1. 3 billion parameter, lightweight, and then, 14 billion perimeter, with image to video capability. So very important, video models come up. And usually they are not necessarily image to video. There is a bunch of like text video, but for people who have played with video models, they want to do, very specific things. Image to video is very important. So one X is supporting image to video as well. for some folks in our timeline, because image to video. With open source models with this quality definitely means a few things for the waifu community, right? So 1x is definitely, up there in terms of using this for, whatever image that you want and to animate that image with any type of physics. let's just keep it at that. Have you guys seen some outputs? We can probably show some outputs as well.
Nisten
Nisten 38:35
I tried this.
38:36
I can confirm it's true. the only downside is the Generations are like 5 or 8 seconds, but yeah, it can be pretty. Yeah,
Alex Volkov
Alex Volkov 38:47
I think, they generally care less, about censorship
38:51
generally than the Western world. Maybe this is what kind of they're planning to, because of the image of the video. Now, it still feels like very diffusion. Like you can see in this image of the wall dance, kind of the face of this woman after she rotates changes multiple times. It's still, the character consistency is not like incredible. You can feel like the You can feel that the diffusion is not like all the, everything's smooth, et cetera. and the motion is not necessarily intact. so I don't know, like in my gut feeling, this is a great model, but it doesn't pass like HY video, for example, or Kling. Kling, I don't believe it was released fully open source. I don't know if Kling was released open source on Netflix. but yeah, we have.
Nisten
Nisten 39:37
It has a bit Of a style data.
39:39
I don't know. I like it, but it is a bit motion blur. cartoonish and stuff, and that people can, identify that it's that, but yeah.
Alex Volkov
Alex Volkov 39:50
Yeah, I think like we can see some examples of a motion blur as well.
39:54
LDJ, what's your take on this?
LDJ
LDJ 39:56
basically that last thing that Nessin just said, I agree.
39:58
there's like this, some sort of compression artifacting, I guess I would call it. And like that effect of that motion blurry, type of thing. But I think. Yeah. It's really cool that this is open source. You said, right? Yeah, I think it's, do we, do you know the perimeter size of top of your head? Or is it, yeah, we
Alex Volkov
Alex Volkov 40:15
have, we just talked about this.
40:16
They have a 1. 5 B and then AB.
LDJ
LDJ 40:21
Yeah.
40:21
I think with diffusion models typically uses more VRAM per parameter than just a language model would. But even then, I think for the most part, somebody with a 40, 90, or especially a Mac could end up running this locally. And. It's cool. Now we have this level of abilities on our computers.
Alex Volkov
Alex Volkov 40:37
Sorry.
40:38
it's one AI three, 1. 3 B. And 14B. So 1. 3B runs on a, 4090 and gets, five seconds of video, 40, 40, pixels. Yeah.
Nisten
Nisten 40:49
They just updated the hugging paste yesterday.
40:52
And, yeah, they have a 14B. They have a, it's one dash AI on, yeah, on hugging paste. Yeah. So they have 14B, 720p, 480p and a 1. 3B, which, should be fun. And honestly, if you guys
Alex Volkov
Alex Volkov 41:05
look at the download sizes.
41:07
the top model, 14B720P, got 30, 000 downloads and they just launched it, what, two days ago.
Nisten
Nisten 41:16
480P got 202, 222.
41:19
Okay. So that's gotta be that model.
Alex Volkov
Alex Volkov 41:23
Look at this.
41:23
It got almost 250, 000 downloads updated one day ago. this is crazy folks. This is like when the waifu research department comes in and they want models. This is like almost 250, 000 downloads since two days ago. This is, it shows how much the community wanted some of these models. All right. So this is open source folks. I think that this is most of it. We're going to cover our VO2. it's time for us to go to big companies and APIs and also acknowledge that we have Kevin, here from Windsurf. Kevin joined us earlier. What's up, Kevin? Welcome.
Kevin Hou
Kevin Hou 41:53
I know.
41:54
Nice to see you.
Alex Volkov
Alex Volkov 41:55
Nice to see you as well.
41:56
we just saw each other last week, in the engineer, and you gave a talk there. Kevin, before we jump, into a conversation we're in a few moments with you, I asked you to come in earlier because like the first few things we're gonna discover, discuss, makes sense to discuss with you as well. in the big companies and APIs, folks. What stood out this week, obviously it was on Tropic Launching, Claude Sonnet 3. 7. And for the longest time, even though better models were releasing and reasoning was releasing and all these models beat everything on, competition code and math and the code forces, et cetera, Claude Sonnet 3. 5 was still the workhorse for coding for many people. And, with the release of 3. 7, we jumped into space. We talked about this. I would love to recap our conversations and vibes about this. For me, sound 3. 7, got such an immediate, agreement in vibes. I haven't, it's rare to get this clarity of whether or not the model is good or not this immediate, right? So like we saw the release, was that Monday, I think it was Monday. yeah, and immediate everybody just jumped in. It just shows how much the community loves 3. 7. So let me start with maybe Kevin, let me start with you. Have you, I saw the winds have already added support for 3. 7. Have you had a chance to play with it? What's your take on this new flair, update model?
Kevin Hou
Kevin Hou 43:13
Yeah.
43:14
Yeah. It's pretty remarkable. I think Anthropic has done some magic here. 3. 7 is definitely. It's advanced. It's cool to see it thinking. I feel like something that people just it's like a step function of the way that you use a product right to feel that it's doing things behind the scenes and actually making that come to life through thinking is pretty cool. it's really good. I think I'm like a bit on the fence because it's expensive, both from like a tokens perspective. And then specifically in the way that we use it at windsurf, like it calls a lot of tools or has a tendency to want to call a lot of tools. I think people have noticed this, it kind of yaps or it reaches for tool calls quite a bit. and that, reasonable level of quality, if not higher quality than three, five, but, it does change the. cost structure that we have to deal with. So I'm speaking about it, specifically from building a product with, three, seven, but then, actually as a user of three, seven, it's been quite pleasant.
Alex Volkov
Alex Volkov 44:07
Yeah.
44:07
the few stats that we can add to this, is this is the first model that gets 70 percent on suite bench, which is state of the art. It's absolutely remarkable. the previous state of the art was, at least on Swibench Verified, was from, WB Programmer, where a co founder of Weights Biases used O1 with a bunch of, I think, top of five examples, et cetera, getting 70 percent on Swibench, With some structure. I don't think they just did it one shot. It's still very impressive. 3. 7 is like Kevin, like you mentioned, is a reasoning model as well. So it's a combined model that like chooses to you can add extended thinking to this model. They also have significant upgrade in the output tokens as well. Nistan, we've played with this model as it came out on the space and then you kept playing with this. what's your take so far on the vibes collected from the community, but also like by yourself?
Nisten
Nisten 44:57
So we went a bit nuts on it and everyone agreed on the first day.
45:02
I do also want to cover again, some of the criticisms that, that came out, that came out right now. I really like it in the main app and I've noticed that even while using, while using the bedrock on, on AWS, it doesn't feel the same, with API there you can get. there are things that you can do there, which are better, like setting up the temperature and system problem, but it's still something feels different about the main app as in better. the other thing is I tried it in a bolt dot the new, so thanks for picking me up for that, free trial. I was a, I was just straight up vibing there. I was able to make a whole app. And then, the one thing that I started noticing after some time, and I think this is why. Other people complain that because the context window is shorter, that does once your code base gets to a certain size, that does reduce its ability to do stuff. And this is probably why other people are complaining that it started changing random files and is doing too many changes. It is a little bit eager. To make changes for me. That's a good thing in the main app, because I am going to write majority of the code, myself and, I do need it to give me all the options and stuff. But, when you use it in the genetic mode, you need to have a lot of, a lot of scaffolding, like bolt has. And, yeah, so people notice that they're prompting strategies that worked on a cloud sauna 3. 5 don't work as well on a sauna 3. 7. Like it is a little bit eager to do stuff. but, overall, like it's a much smarter. it's just a much smarter model, overall, it'll just take a bit for people to adapt their prompting.
Kevin Hou
Kevin Hou 46:50
I was able
Nisten
Nisten 46:51
to one shot entire apps in, in, in bold.
46:54
It sold the whole bunch of, other stuff. I compared it with grok. I compared it with, yeah, I didn't compare it with deep research, but, compared with other models, it was just able to, it's able to make much better diagrams. yeah.
Alex Volkov
Alex Volkov 47:08
the few things that I noticed and, you guys can jump in and say
47:10
whether or not this is your experience as well, at least from the vibes and timeline that it looks like this model was trained on UI specifically. So a bunch of people are one shotting like landing pages fully and nothing close to this used to happen before. Obviously there's the. The type of AI influencer that says 10 wild examples and like some of these examples, you can use different models as well for this, but here's a very good straight up like empiric tests. LM arena has a web dev arena. Web dev arena is usually where folks are building things. I think, they actually have a VS code. extension that generates two things and you choose inside VS code, like which one do you want? Cloud 3. 5 was the leading model back on this web dev arena above DeepSeeker 1, above Grok 3, above O3 mini, above all these, models. So Cloud, 3. 5, Cloud Summit 3. 5 was the leading one. And now Cloud 3. 7 is the significantly leading one. This, this means that people are actively choosing 3. 7 responses. I have seen the criticism that you guys are referring to. I have seen folks saying, Hey, I switched back to 3. 5 because 3. 7 gets lost a little bit or yaps too much or like things for too long. I've seen this. I wonder as well as at least part of it is we're getting used to getting, Like thinking models, you need to problem them a little differently, but also it could be the fact that like it, it goes and calls too many things to mention the cost. They also released Cloud Coder, which we played around with, and, we'd love to hear your thoughts about this, Kevin, as well. Cloud Coder is a CLI tool that, basically allows you to connect your cloud account. You have to be a paid account. You cannot just use this for free. And basically, it's very similar to AIDR, I believe, right? your CLI tool, it goes into a directory of your choice. And then you generate this cloud. md file that basically understands what your repo is about. And then you can ask the CLI tool to do changes, to create files. It has all the access to all the things it shows you, Hey, I'm about to create this file. I'm about to do this. I'm about to commit. It has a, pull request review mode, and then, it has a hidden mode for stickers. If you're using this, you can get some cloud stickers, as well. That CLI tool has a slash cost command. And it's really funny that they built it in. they also print out the cost at the end of every session. And I've seen folks. Just initializing the repo, getting 13 cents, just like by initializing repo compared to other models that we have 13 cents on DeepSeek or 15 and get you like a million tokens initializing the initial run with 13 cents, it's just comparatively insane. Now it's insane to us and we can play around with this, given you guys are actually building like a product with this model. So like price prices is going, very important there, but we'd love to hear your thoughts about them releasing like a coding thing, in the CLI and how you think about this and, yeah, I thought it was super cool. I'll play with it, but we'd love to hear from you. What do you think?
Kevin Hou
Kevin Hou 50:08
Yeah, I thought it was, I think it's awesome.
50:10
I. I'm bummed because I got hit with the waitlist bit, so I haven't actually been able to try it. Try it.
Alex Volkov
Alex Volkov 50:16
Yeah.
50:16
Oh, the waitlist already. Wow. Okay. So yeah, we talked about this. This was a, limited edition thing. And back on the stream, if you guys remember, I told everybody, Hey, go in there and go quick because like they got the waitlist. okay.
Kevin Hou
Kevin Hou 50:29
Yeah.
50:29
Yeah. but conceptually really interesting to see and observe as a, as an observer. I guess there's a couple like dimensions that we can talk about here. One is is the CLI form factor something that is helpful? and then two is more like, what does that mean for a lot of the application companies like us and cursor and whatnot? I guess speaking on the first part, Oh yeah, there you go. I thought the CLI like this is on the better end of like how CLIs have been executed for. Like AI experiences. I'm generally like fairly skeptical of purely terminal based, LLM tools, but it seems like they've done something good here. I, again, I will reserve judgment until I actually use it, but it's pretty cool and the quality seems quite good and I think this is just overall agents have gotten much better and like models have obviously gotten better. And so these things become more and more possible. the other side of the equation, which is like. Okay. They're entering into the application game. I think it's good for the ecosystem. It's certainly great for like devs that everyone is like rapidly improving for the end user. I think that, there's a big difference between the terminal and an IDE. I think you'll probably reach into it for maybe two separate personas or perhaps two separate, like workflows. I think this. Interestingly sets them up very nicely for things like containerization or, potentially things running in a more like async form factor that is disconnected from an application layer, which is interesting. So I'm wondering like what the angle there is and like what the future of the tool is from an immediate like day one. All right, you now have a terminal and you now have a bunch of IDs. I still think that the idea is probably where people want to spend most of their time. but again, like time will tell on how this like shapes up, but yeah, most importantly, I just want to try it. I just want to use it. It seems really cool.
Alex Volkov
Alex Volkov 52:16
So folks from on topic for listening to this, let Gavin access, we
52:19
want to hear Gavin's, thoughts about this. Nistan, LDJ, have you guys get access to this or no?
LDJ
LDJ 52:25
I haven't used it yet myself.
52:26
No. I see.
Nisten
Nisten 52:28
I didn't use it because it scared me how much it was costing people.
52:32
A person was saying that they let a job run on it overnight and they couldn't sleep because they might wake up homeless.
Alex Volkov
Alex Volkov 52:41
I definitely felt this.
52:42
I launched so in, in this report that I'm showing to folks, for watching us, I ran, I don't know, give me thoughts about. despite the notebook that I have, and it just started showing me like execution time and then it got to 900 seconds and at the end I had to cancel out and nothing happened. And basically the whole time I was watching this, I felt that just like my mind is getting wasted away. So I actually don't know about this form factor as well, because like the thinking part and the strength part, you have no idea how much it's going to cost. So absolutely I saw folks like getting excited about this, but also like getting scared of the cost, whereas I saw that they prefer, Being locked in into kind of a product that they pay for and the kind of the cost is getting offset by the product for 3. 7. Speaking of which, I've seen the whole industry pivot on a dime and support 3. 7. It was incredible. Kevin, you guys supported it like super quick. I reached out to our devs and at the Watson Business Week to say, hey, folks, we need to give, the users the option to compare 3. 5 to 3. 7, for example. It was implemented in bold. new, and we gave the cheat code for folks to get a whole year of bold. new. we can maybe talk about this at the end. If you're interested in a whole year of free bold. new, stick around to the end. Maybe we'll do it in the end as well. all So folks, a few more updates on the big companies before we get to the interview with Kevin. Kevin, so thanks so much for joining us, just a bit before. So this is Cloud Tech 1. 7. what else should we add there? the. I think we've covered most of it. but basically the evals, they look really immaculate for 2. 7, but we are seeing a little bit of a draw, back from folks after they have used it on the price specifically. but definitely we got like the top coding model right
Nisten
Nisten 54:20
now.
54:22
Yeah. so people, yeah, people measured the context length for, Yeah, so the maximum that they could add to Quad projects, like the maximum pre context was 110, 000 tokens for the Thinking model, and for the extended model and 140, 000 tokens max for the regular sauna 3. 7. So there is a reduction in there because on the last one you could add up to 180, 190 or so, and then still be able to use it and review that code. So there is a reduction in context size.
Alex Volkov
Alex Volkov 54:58
One thing that I saw That, they announced in the API that
55:03
you'd be able to control precisely the thinking, which is unlike O3 where you get only the three levels. I haven't played with the thinking API yet, but definitely need to run it for its bases. but that is very exciting. And supposedly today OpenAI is going to release Their combined model as well. So we'll see how that goes. it's a very expensive, great new model from Tropic. Shout out. We've been waiting for Tropic to come back and, now we have, and it looks like everybody's trying out and we'll see how the prompts also improve with this model. All right. Moving forward to big companies before we go to the interview with Kevin about Windsurf, the thing that, got announced a second before the show today. And we're probably going to mention this, and put this. Maybe in the actual editing, edited version of the episode. GPT 4. 5 is about to release today, folks. This is, almost definite to this point this morning at 7 30 AM Pacific time, the main account for OpenAI posted a live stream in 4. 5 hours. And just before this, we saw. a release from folks like Tibor Blaho and some, some other folks where the app was updated, the iOS app and in the assets, there was a screenshot that says, limited access to 4. 5. We also saw Sam Altman talk about the roadmap for OpenAI. In the roadmap, he said that O3 is not going to get released as its own model. It's going to get roped into a 4. 5. It's going to be a combined model. So like Cloud Summit 3. 5. we're going to get a combined model for OpenAI. Hopefully today we will have a, release and not only just an announcement blog and some stats. We already had this back in the end of 2024 and we know all three how good it is. nothing else to say there. I don't believe that we have any new details. Once we have them, we'll let you know, in other big company news we have. Let me get there. What we want to cover is, also OpenAI launches deep research to plus members. So folks, those of you who haven't tried deep research to me, and I mentioned this multiple times, it's another chatGPT moment. Deep research is, this mode, it actually uses all three behind the scenes, is this mode from, chatGPT that goes and does, goes, visits websites, understands what's going on there. Try to understand your, perspective. We actually had a deep conversation about deep research with, Dr. Umuntaz, and this thing, remember this was a great conversation as well. Deep research to me is another chat GPT moment. it's that good, like everything that I now have a little bit more time for a research, suddenly I buy something where I go, for example, on travel. Deep research is absolutely like the number one thing I hit. And, I actually upgraded. My chat to PT, the 200 bucks, specifically because of operator. I haven't used since then, but deep research, you keep using every day. And now, you, everybody in the plus tier got deep research, just. Give it a try folks. I've been waiting for it to hit the mass things. And then, I haven't seen as much of a reaction to this release as I expected it to be, honestly, so it feels to me still under the radar. We're known here for Thursday. I of telling you things about like below the radar before the blow up. So we've talked about deep sick a while before it blew up. And we've talked about like agents for a long time as well. So I'm telling you now. deep research is this now the reason why they released it now for free or like for the plus tier is because deep search from grok is not nearly as close, but it works the same. So I've been using both. So grok also released deep search, the example, but deep research still outputs this incredible essays of everything that you want. And definitely worth a shot now that it's like in the plus tier of GPT.
Nisten
Nisten 58:46
Yeah, I've shown it to people who are not necessarily in, in programming
58:50
or, or developers for people that do other jobs, they're just completely blown away. I keep seeing that, that reaction over and over that, Oh, I spent like my whole life looking this stuff up and now this just didn't tell nothing, nothing comes close to. OpenAI's Deep Research. Yeah, their models are not as competitive as Supercut, but when it comes to Deep Research, that thing is legit. People like pull actual economics data, pull actual stuff.
Alex Volkov
Alex Volkov 59:23
So one thing that Deep Research does well, incredibly
59:26
well is that you guys know how we like, we'll talk about a new model. There's evals and those evals, sometimes the company posts them and they don't include Quinn for something or I don't know, they don't include a specific model that we want. Deep Research is amazing at this. You give it like a table of evals of I don't know, Gemini 2. 0. And then you say, Hey, get me the same stats for Quinn. Get me the same stats for this model. It will build you the actual evals table yourself with all the stats and we'll do an incredible job. We'll try this multiple times. So definitely deep research worth checking out. folks, one last thing in the big companies, API is that we have to mention, two things before we jump in the interview, Alexa plus Amazon. Finally, we talked about this when we talked about this, like in September, October of last year, that they're about to launch something. They delayed it and delayed it. If you have an experience, if you have an Alexa at home or multiple ones, if you're listening to me on speaker and I'm saying Alexa and turns on is because it's stupid, but, if you have an Alexa at home, you know how in the era of 2025, someone like advanced voice mode from chat GPT is incredible. Grok voice that we're going to cover in a second is incredible. You can talk to it, you can interrupt and Alexa is still stupid. It's just like ridiculous. So finally, after a long time, after Amazon tried to develop their own models for it, after back and forth, like it's a huge company, Alexa plus was announced with a bunch of videos and it's going to have its own tier, but also come free for print prime members. I believe, Alexa plus will include Claude. So this is part of the reason of their investment in, Entropic. they invested like a, a lot of money in Entropic. I gotta stop count. I think there was like 1 billion before and then 4 billion. so Alexa Plus will be a LLM based, digital assistant that lives in your home. One of the first ones, I believe, I don't think that, Google added Gemini yet. Siri is definitely dumb still. We're waiting for smarter Siri, and it's been a while, and it looks like it's going to take a while yet. and so it looks like the first smart assistant that's home based is going to be Alexa It has all these integrations that they've talked about, and the very interesting thing is going to be powered by cloud with fallback on Nova, which we know the cloud is a great model. Nova is. Let me say this. If Nova was any good, then it would be only Nova. They wouldn't have to use Cloud, but they are using Cloud. very interesting and very telling that they decided on, this fallback. they also came out with a bunch of new SDKs, Alexa AI Action SDK. So you can connect your APIs and, you can leverage. The LLM powered reasoning of Alexa. This is so cool. I'm like looking forward of a whole new crop of AI use cases for your home. I don't know if you folks built any like Alexa experiences before, there's something to the embodied voice kind of conversation. the, I was always listening, even charge up to the advanced voice mode. You have to turn it on. You have to open your iPhone, turn it on, et cetera, and wait. And also it cannot do anything like it can go and search the web for you. There's no actions. Alexa is action based. And they're now adding an API for it to actually do perform things in your home. So your home, your smart home, for example, is going to be connected directly to Alexa. I'm actually very excited. The more I talk about this, the more excited I get, and this is not, I'm now six. So it's not my usual excitement level. I think this is going to be actually dope. So looking forward for that, and we'll definitely cover this once it works. the voice of it is interesting. It can call you new work and update you if the Uber is coming. One thing that I liked that they showed is. Yeah. It can search through ring or whatever. Alexa has like a bunch of, video cam integrations. It can search through them and find the exact spot. So it's probably multimodal as well. It can find the exact spots on your recent video cam things of when something happened. So one example they showed is, somebody asking, Hey, did I let out the dog today? And Alexa is like search through the ring. Or whatever, camera and said, yeah, the dog was actually like out a while ago. I don't know if it's marketing thing or Alexa is going to be actually smart enough to do this, but generally conceptually that sounded super cool. thoughts about this folks, before we move on,
Kevin Hou
Kevin Hou 1:03:26
I've just been waiting for an Alexa upgrade for a long time, right?
1:03:30
I've stayed so true. Sometimes I'm sitting there and I'm like, turn off my lights or something. And man, these LLMs are so good. This is absurd that we're still living with this current tech. I'm saying the chasm
Alex Volkov
Alex Volkov 1:03:42
between where we are at the edge of like building with intelligence
1:03:44
and the fact that my home assistant sits and basically does timers for me. It's crazy. Yeah, it's crazy. hopefully finally Amazon decided to give us this look, I'm looking forward to this, but I'm also looking forward to the fact that like on X, the amount of stupid stuff that it will do is going to blow up. everybody's going to have this viral moment of Hey, I said, I actually do this. And then my house is on fire or whatever. so hopefully don't connect your smart. toaster ovens to it yet and don't do this kind of stuff, but I think that with the SDKs that they've released, I think it's going to be actually like a boon for home assistants and getting to a point where it's not her, but it's more of an assistant in your house that can actually do stuff. that's helpful. Turn off the lights, turn on the TV, play something, check about your like to do schedule, et cetera. So the potential is insane. I'm looking forward for buying maybe a new Alexa device, that will enable this. Last on our docket for big companies and APIs is. Similar to what we just talked about. Grok released voice mode. A week ago when Grok 3 was released, they only released the better preview, and they didn't release voice mode. this is huge. Advanced voice mode, from Grok, in an unhinged way, made its way to the application. And if you guys haven't heard, how that is happening, we will now have a conversation with an unhinged voice. If you have kids listening to this right now, probably put on headphones because they actually released an unhinged voice, setting. Literally you go in there and says unhinged 18 plus. this must be We must talk to it. Let's see. See if we can talk to it. hopefully you guys will hear, what this is going to be. Unhinged voice mode. It's gonna swear a lot, I'm telling you ahead of time. Hey there.
Grok
Grok 1:05:34
Yo, Alex.
1:05:35
What's good, you horny bastard? Wh How's your day been so far? Fucked up or just mildly shitty?
Alex Volkov
Alex Volkov 1:05:41
Are you just cursing for the purpose of cursing, or do
1:05:44
you actually have something to say?
Grok
Grok 1:05:47
Fair point, you cheeky fuck.
1:05:49
I got plenty to say, but I'm not here to bore you with bullshit. Just figured I'd spice it up a bit. Keep you on your toes. What's got your panties in a twist today? Anything worth a damn rant?
Alex Volkov
Alex Volkov 1:06:00
yeah, I don't know tell me a story.
Grok
Grok 1:06:03
All right, Alex, you indecisive prick.
1:06:06
Here's a story for your ass. So There's this dumb motherfucker I knew, let's call him Shithead Steve, who decided one night he's gonna impress some chick at a bar by chugging a whole Alright, alright,
Alex Volkov
Alex Volkov 1:06:18
stop.
1:06:18
Okay, okay, so we just heard an unhinged voice mode, absolutely. The thing that blew me away is that the fact that if you go to I don't know, JGPT, for example, you have to jailbreak it to speak like this. If you follow a Pliny the Liberator, et cetera, there's a whole thing to get to the point and that barely does it. they added this as a fucking feature to the UI. You click a button and you get this. I'm not sure what's the point, but yesterday it was very funny. I'm going to add this to the show notes as well. Yes, it was really funny. I played with it, play around with this and then forgot that it's there. I'm sitting working on the notes for Thursday. I, and then I sneezed and like just after just a sneeze, it started going. let me just play this for you guys. I think I have it here. this is what happened after I just sneezed. Does this work? No, it doesn't work. basically, it just started telling me, You're a shithead, you're a shithead, you're a shithead. And it kept repeating just this sentence. And I was like, I just sneezed. What's going on? is there a point of building this in the UI? Probably for reality examples, yes. but the voice mode is very empathetic as well. So you can hear emotions in the voice. It can understand you well. They have a bunch of other characters in there, Grog Doc. They have a conspiracy person that you can talk to, unlicensed therapist is pretty cool. So they have a few, they have a few 18 plus modes. They have an hinge, they have sexy, they have motivation, argumentative. Let's talk to, with argumentative. Hey there, what are you on about?
Grok
Grok 1:07:48
Hey, Alex, what am I on about?
1:07:51
whatever you're thinking, it's probably wrong. What's your big idea of the day? Let's tear it apart. What's on your mind?
Alex Volkov
Alex Volkov 1:07:58
AI are going to be good for society overall.
Grok
Grok 1:08:03
Nah, that's a lot of crap.
1:08:04
Alex ais are just gonna amplify all the dumb stuff humans already do. Think bias, laziness, and screwing over the, so basically
Alex Volkov
Alex Volkov 1:08:12
it's your whole point of negating everything I
1:08:14
say with a different thing, right?
Grok
Grok 1:08:18
Pretty much.
1:08:18
Alex, you say something, I'll rip it to shreds. It's how this works. You're claiming AIs will be good? I say they'll suck for society.
Alex Volkov
Alex Volkov 1:08:28
All right, so I'm getting, replies that the AI went down on
1:08:31
Anyway, folks, this is now, built into Guac with multiple personalities. The voice is, very interesting. Let me reconnect to Twitter space. I think I broke it down a little bit. One second. Hopefully I can reconnect. Twitter space. Give me a sec, folks. Oh no, our space crashed. Oh, lovely. All right. So before we jump into a conversation, Kevin, let me just restart the space. we'll do another one. Folks who are listening, no space. We're
Nisten
Nisten 1:09:03
back.
1:09:03
It's probably good for people doing interviews or whatever. I'm surprised someone hasn't just come up with a system problem. I know there's probably like 50 startups asking for, yeah, but, yeah, there we go.
Alex Volkov
Alex Volkov 1:09:18
Alright, we're back on space.
1:09:19
Alright folks, if you can rejoin, because it looks like X kicked us out while me trying different grok things. Apparently grok and spaces don't work. So thank you folks for letting us know. alright, we're gonna wait for folks to rejoin and then, let me send you guys this in the DM so you can join as well. And then we'll start with the interview shame to, we have a bunch of folks in there, but at least we're live on the live stream. All right. this is it for the big companies, conversations. We'll wait for a few folks to come in before the interview. Kevin, if you don't mind jumping in there as well and be on mute. So folks know who you are on Twitter space and we'll be able to follow you as well. And he's telling the J welcome to go and join as a co host. It's a shame that Twitter dropped us, man. Sometimes happens.
Kevin Hou
Kevin Hou 1:10:07
So I've never actually done this.
1:10:08
What am I doing here?
Alex Volkov
Alex Volkov 1:10:12
In your DM and just join the Twitter space and then from your
1:10:16
phone probably and keep it on mute. Yeah.
Nisten
Nisten 1:10:20
The spaces app is buggy as usual.
Alex Volkov
Alex Volkov 1:10:23
Yeah.
1:10:23
Spaces app is not great, but it should be all right.
Nisten
Nisten 1:10:27
I've been playing with windsurf the whole time.
1:10:29
Yeah. Yeah. It's not that I do like the higher level of control than the other, the gentic things, and it does feel cleaner than cursor, which, I don't know if it's yellow mode or whatever. It just goes off. But again, I don't really use them as much. I more try them. I just like that. It does give me control to actually do stuff and the other ones do it well, but do as well. It's just, yeah, it's not overbearing. that's, I usually don't like these agendic editors. Like I often just wanted to either just do the whole thing, give me a code base or just be on the side and help out with stuff. but yeah, my only thing right now is that it's just taking a while. yeah. Gonna, which model are you on? I tried 3. 7 thinking 3. 7, 3. 5 and R1 and also V3. I tried to flash as well. I was surprised that one didn't work that well. I was hoping that Gemini flash was going to be faster. all
Alex Volkov
Alex Volkov 1:11:39
let's officially jump into our, interview portion with Kevin.
1:11:43
Kevin, this is your first time on the pod. So welcome to as a friend of the pod. Now you're considered a friend of the pod. We met a week ago at the engineer. You gave a talk there. Give us a little bit about maybe who you are, what's your background and let's talk about where you come from to this whole field and then we'll talk about what you do now,
Kevin Hou
Kevin Hou 1:12:00
sir. I'm going to make sure that my,
1:12:07
I got some weird audio issues right now. Okay. yeah, no, you
Nisten
Nisten 1:12:12
might have to mute your Twitter tab or
Alex Volkov
Alex Volkov 1:12:15
yeah.
Nisten
Nisten 1:12:19
you'll hear yourself back and it's very annoying.
1:12:23
Sorry. I ruined the introduction.
Kevin Hou
Kevin Hou 1:12:26
Very good.
1:12:27
We can, I just put it in a different
Nisten
Nisten 1:12:30
room.
1:12:30
Yeah.
Alex Volkov
Alex Volkov 1:12:32
So Kevin, if you want to give us a little intro to
1:12:35
yourself, we'd love to hear, who you are and what's your background.
Kevin Hou
Kevin Hou 1:12:37
Thanks for having me on.
1:12:39
my name is Kevin. I am, a software engineer my entire life, love writing code. I grew up in the Bay area and so San Francisco tech has always been like a big part of my life. A big part of, my hobbies and my passions. I ended up studying computer science in college on the East coast, came back to the West coast to do the whole entrepreneurial thing, worked in self driving for a bit, doing a lot of things around like simulation and visualization and just like evaluating how well car performance was going. Happy to talk about, my adventures in self driving, spent a couple of years there and then ended up meeting, I guess what would end up becoming the founding team of. Windsurf and Codium, joined Codium, two and a half or two, two years ago, two and a half years ago. And most recently, and I guess the thing that most people probably know us for is we released Windsurf, the AI editor, about three ish months ago. and it's been a fun ride since. I had the pleasure of meeting Alex in New York. I think I also got sick. It sounds like you're coming off of something. I think too many handshakes, I'm also coming off the tail end of a sickness. but yeah, that's, I guess in a nutshell, super happy to be here, happy to answer whatever questions and just chat about AI.
Alex Volkov
Alex Volkov 1:13:50
So one thing you mentioned is Codium and it took me a while to
1:13:55
understand, like it was Codium and now Windsurf is like the product. Could you talk about Codium, what it was, and for folks maybe not connecting the dots between how like Windsor didn't come from basically you guys have been doing this for a while. It's not like you just showed up three months ago.
Kevin Hou
Kevin Hou 1:14:09
Totally.
1:14:09
Yeah. I guess I can take you back for The entire ride, which is we started as X a function. So X a function was, an ML infrastructure company. I won't spend too much time talking about this, but basically there was a first iteration of the company. We were working on high inference throughput. So think if you're running a GPU a lot, how do you actually squeeze the most amount of compute out of that? GPU has squeezed the most amount of efficiency and. And process the most number of requests, all that to say, we were working on that. It actually ended up becoming quite a good business. And then, about a year in, we were like, all right, screw it. We're going to work on like the application level. We were seeing co pilot coming out. We were seeing all these LLM coding tools. And as developers, we just really wanted to jump on that. And we felt we could do a lot of things better differently. we haven't had an ML and an infra background. And so we built Codium, which was. An autocomplete product initially. So we were one of the first autocomplete non GitHub copilot products back in 2022. we end up our slice into the market was actually we released this for free for individuals because we trained our own models and we built our own infrastructure. We could do this super cheaply. and this has been consistent theme throughout our existence as a company, Codium. and so we started building this. Users loved it. I don't know how many. People are still on the Coding Extension versus Windsurf, for a variety of reasons, people like just having autocomplete in their normal VS Code environment. We also supported JetBrains, Vim, Emacs, Jupyter Notebooks, like we have a list of 40 different IDs. And so people really loved our extensions. We also sold to enterprises. So some of the largest enterprises in the world, like Dell, JPMC, these like large corporations that wanted a secure autocomplete solution or an AI toolkit, we also had chat in there. we were able to offer this to them in an air gap environment. So on prem deployments. And that's what a lot of companies have known us for. In addition to just having. fast moving, fun to use product. I think what we observed and happy to talk more about this is like the AI abilities of, like the industry, I guess the industry had gotten to a point where the AI abilities were so good that we wanted to be able to capitalize on them. We're getting restricted by just being a plugin. I think there's a handful of things that we could talk about here, but ultimately, we want to give an even better experience to the users. So we decided to build windsurf. this was a decision that we made. around Labor Day last year, and we released on November 13th, it's taken over the internet by a storm, which is fantastic, not great for my sleep schedule, but fantastic for, just the company and the team, but like you said, we've been working on this problem for the last two years. And when you think about ways that Windsurf shines, so for example, context retrieval, we've been spending a lot of time working on context retrieval for the last two years. I've given a handful of talks in the past, but more importantly, like we've just been heads down working on. And that's like a core aspect of AI assistance, core aspect of agents. And yes, when we released windsurf and we released our agent, sir, that is a new product, but really it's backed by years and years of research and trial and error and, experimentation and all this sort of thing. So we've not starting from exactly zero, The product has inherited a lot over the last years.
Alex Volkov
Alex Volkov 1:17:24
And it's blown up.
1:17:26
Absolutely. I think you cited some numbers on your AI engineer. talk, would you love, would you recite them to us again?
Kevin Hou
Kevin Hou 1:17:32
Yeah.
1:17:32
I think I said 500 K plus active users was on my slide. And then, the fun one that I like is 4. 5 billion lines of code generated, which I could be zoomed back. That's A staggering amount of code. and I think I want to say this is only the ones that get accepted into the code base. I have to double check on the exact number, but like an astonishing amount of code. it's, I guess the average rate and this is just forever increasing or hopefully it will forever increase because people are joining onto the product. but we're doing about 700 to 800 cascade messages a minute. Which means, as we've been talking, there's been hundreds of people just asking for refactors, asking for code, asking for help. and yeah, those are some of the high level stats.
Alex Volkov
Alex Volkov 1:18:16
That's great.
1:18:17
The numbers are absolutely incredible. I want to ask you about, cascade specifically and just. You started, talking about Sorry,
Kevin Hou
Kevin Hou 1:18:24
Alex.
1:18:25
give me one second. There's a really weird scenario today where I am
Nisten
Nisten 1:18:33
I'm actually on the phone.
Alex Volkov
Alex Volkov 1:18:37
No, put Kevin on mute.
1:18:38
meanwhile, we're going to chat about, the different way. Okay, so let me start the question. you ready? Sorry, yeah. I have
Kevin Hou
Kevin Hou 1:18:44
a, there's an HOA situation tonight.
Alex Volkov
Alex Volkov 1:18:46
Oh, okay.
1:18:47
Hopefully, we'll get it solved. Okay. Oh, no,
Nisten
Nisten 1:18:50
he's both on Twitter and on stream at the same time.
Alex Volkov
Alex Volkov 1:18:53
Yeah.
1:18:54
so what I wanted to go through is you mentioned, coding, the code completion stuff we've been, we've seen kind of progress in this area of AI coding things improve. We started with code completion with copilot and you guys did this. Then there's the chat interface that I think, started in. and then everybody else implemented this now we're going towards agent and you recently released in a tweet that went like crazy viral about we're not doing chat anymore. It's all agents. could you walk us through how you see the, improvement, the iterative improvement of kind of these tools to help developers, just from a UI perspective, because we're just before we started chatting, we talked about also the CLI tool that the cloud got released. We'd love to hear from you and your perspective, because you guys are building, you're deciding. What interface to build for developers, obviously you're taking a huge bet on UX. Many people prefer when, so for example, when they compare it to other competitors, for example, because of the X, could you walk me through the iteration of, AI coding tools as they happened until this point and what's where we are now?
Kevin Hou
Kevin Hou 1:19:55
Yeah.
1:19:55
Yeah. So I can give some background as to the decision to go chat versus agent. I think there's a little bit of nuance here. I guess people in this space will understand the terms that I'm using, the average user that is not keeping up to date with what technically an agent means. there's a little bit of confusion here. so the thread, and this is what I tweeted about the decision that we made was there is a chat bot paradigm. So message in one message out does a little bit of retrieval before it does every message, very traditional kind of like rag based. generations. that was like the way things were done for a year over a year. this is how for more or less like chat GPT and, get help co pilot chat and codium chat all worked. we also have, there's like an agent paradigm, right? Which is you send a message. It could be many messages that they respond with or that the agent responds with. It could be many tools that it calls, but the idea is that. It doesn't just respond to your most recent thing. It takes into account a bunch of context and then plans out what it's going to do. And at any point can continue to plan into the future. So it's like an a variable number of things that it could do for you. So if we contrast these two situations, it's okay, the first one you're going to ask. what does this project do? let's just say in a very naive but benefit of the doubt, it will, look at your active file, always, it will always, search the web for a certain query if you included like a documentation link, and it will always, I don't know, run an embedding search over your code base, right? That's like a chat way of doing things. So runs these three things every single time. That's like a little slow, a little bit unnecessary. And then also you're gonna get a bunch of junk results. It's just not, it's not super clean, but the agentic way of approaching that problem is okay, what does this code base do as a human or, the agent tries to behave like a human. It's like, all right, what should I do next? Probably run LS. Let me run LS in the home directory. Okay. Now that I have the results of LS, there's a package. Jason there to read me. Okay. I should probably read these files. As a human would, right? You go to a GitHub repo and you're like, okay, let me skim the read me. it'll read those files. And then if it needs more information, it can continue on this thread of trying a bunch of CLI tools. Otherwise it can just respond to you. So you can see how like the agentic paradigm and the chat paradigm are actually incredibly different and yield incredibly different results. so when we started windsurf, we made the pretty bold decision of saying, all right, we're not going to do chat stat was like the pattern that, copilot and cursor and all these companies were using, we are just going to, it's going to confuse users. There's like user experience is going to be disjoint. And you're seeing this with, if you do offer a chat and an agent product, it gets very confusing. you are in the business of giving users the best experience possible. So that's an agent. And that's going to be a very different pattern. We're going to have to do some education. The UX is going to be, have to be completely rehauled, but that is a very different product than what you get if you just purchase. Kind of a chatbot.
Alex Volkov
Alex Volkov 1:22:48
Yeah, that's great.
1:22:49
And so we, both met again at the AI Engineer Summit in New York. the theme for the summit was agents at work. And obviously you were on stage talking about the agents that you guys have built. many other folks have been there also talking about the different things. And I think one of the interesting things in the summit was generally the conversation of what works and what doesn't work still for agents. As 2025 is the year of agents and reasoning as well. We'd love to hear from you, What the agents are still struggling with, what are still, we're waiting to improve specifically, maybe, would be interesting to talk about like reasoning as part of kind of the agents, upcoming, whether or not they could plan and then execute and maybe forget, we'd love to hear from you, like practical examples, like things that work and are incredible and things that like, maybe we're waiting for models to be actually like significantly better at.
Kevin Hou
Kevin Hou 1:23:37
Yeah.
1:23:38
Yeah. There's a lot of magical experiences. I can touch on a handful, like yesterday I was debugging something and I was looking at, a third party documentation site and I was trying to build an integration from Windsurf into a third party. I ran into a bunch of things there. I'm ready and go, which is a language I picked up about it. when I started at this company, so I know it fairly well, but there's always like a little holes that I run into with, syntax or patterns or way of doing things. and then also just like, all right, you have documentation. Am I using the A. P. I. Correctly? Yesterday I had a pretty magical moment where I was like running into an issue, the code compiled, it was fine, but then the actual functionality of what I was building didn't work, and for context I'll summarize it by just saying I was like uploading basically an asset, like a blob, to some sort of remote server. Turns out I wasn't using the go library in conjunction with the documentation correctly and the way that I asked, like trying to get the thing to fix it. I asked windsurf, like I'm not seeing the results that I want. I think I said the payload is not being registered on the backend. So I didn't see the blob show up in the third party. And it went out and searched the documentation for, or it searched the internet for the documentation of the library I was using with that endpoint. So that was insane. First of all, it took that and then read the code that I was using. And this is without any app mentions. This is just me in like the side panel. so I just asked an English language and then it pulled in the function that I was doing. We do some ASD parsing and whatnot to like, give me the function. that it thought I was working on, which was correct. and then it like chain those two things together. And then actually made the edit and made the change. And that was a pretty shocking and magical experience. So that was a, an example of a positive thing.
Alex Volkov
Alex Volkov 1:25:18
So one thing that I would love to hear from you as well is obviously
1:25:23
you're like, you have a product on this, but what are some failed cases? What are some things that like agents do? I think it's very important for folks to know about totally restrictions.
Kevin Hou
Kevin Hou 1:25:32
Totally.
1:25:32
Yeah. So I guess there's a bunch of things and this is obviously where I spend most of my time. As an engineer, like I'm always critiquing and we are like, we often say we are the biggest users of our own products. there are a lot of things that are still on the cutting edge. I would say one that's quite interesting is this notion of like memory and checkpointing. I guess those are two slightly separate things. the notion of memory, and this is something I talked about on my presentation, an unsolved area is. Tell the agent something once and have it remember forever, right? If you think about the spread between a good developer and I guess chat GPT, it's like humans can remember things and they can learn. And if you tell a good engineer something once they will remember it forever. That concept is really hard to instill into your agentic infrastructure. So you can have like memory banks, you can have all those things, but how do you actually, a, how do you eval that sort of behavior? And then B, how do you actually like iterate on that sort of thing? that's like an unsolved problem. and then the checkpointing part, this is an interesting failure mode where Depending on the type of user you are, you end up running one conversation or you run like many smaller conversations, regardless, you will run into this at some point where your conversation just gets too large for the context lane. And I say this, it's not like the context window of, chat GPT in the sense that. It's not what we don't literally take every single token from above and put it in the window and then it's just a sliding window. It's not that simple, but you can roughly think mental model like the conversation gets very large to a point where you can no longer include all the relevant information from your conversation history. How do you then. You don't want the agent to forget that you had certain steps above, right? That's when you get into this unfortunate cyclical, Oh, you solved this bug, but you introduced another bug, then you're like cycling, everyone's had this like doom cycle situation. that's not like a failure of checkpointing and memory. It's like your conversation got so long. How do you handle a situation where you need to start summarizing past information in order to remember only the important parts of what happened before? so this is. That's another failure mode, something that we're always have our eye on. And I think this is where, it's a big week for models, so you'll experience this in the products, but every model has different tendencies when it comes to tool calling. And this is something that, every agent is going to handle differently. I think even if you opened up cursor and you opened up windsurf, you'll notice different ways that, for example, three seven is implemented, and they both have different they're they both have their trade offs. Thanks. and this is just where, prompting and getting the agent to produce the right arguments for various tools, like the, these are the things that we need to eke out and all these, there's a list of, hundreds of items like this. Agents are going to improve because we make these incremental improvements that, that compound together. so there's no like silver bullet, okay, this is the failure mode. There's just a lot of failure modes that we have to cover in order for us to get to, 95, 99 percent accuracy. Does that make sense?
Alex Volkov
Alex Volkov 1:28:37
Yeah.
1:28:38
I actually have a few follow ups here. One of them is you mentioned how to evaluate. So we'd love to hear from you generally how. Evaluation works. Like one thing that we talked about in AI engineer is like evals, obviously with some biases we have an evals product and I've been talking to folks about different LLM judge strategies, et cetera. We'd love to hear from you just practical examples. How the hell do you even know that what you built, the prompting things for your agent, that they actually improve the users tell you, give you feedback. Do evals they're running? We'd love to hear like a practical examples. I have a follow up after this.
Kevin Hou
Kevin Hou 1:29:11
Totally.
1:29:11
So we have. There are, you can think of it almost as like unit tests and testing type of paradigm. Like we have unit tests for various things. So like retrieval has it's tests and it's test bench and it's evals. so I gotta give you an example of that one. We run like a needle in a haystack type of query on some data that we've collected specifically for code and specifically for some of the more agentic use cases that we've seen. we will, basically run our retrieval engine on top of that. So it's not quite. It's not like a it's a nondeterministic test, and we'll run this and make sure that the performance and the kind of the golden set and all these sorts of things on specifically a certain type of retrieval will work. So we have a bunch of these sorts of, smaller scale tests. But I think when you get to we have sweet bench, we have all these other benchmarks. The thing that we care most about is probably in the strength of Cody and the company having distribution is, online metrics. So being able to. Overnight see performance from production users, not necessarily like the code that's generated. Honestly, it's that's less interesting to us. it's more, how many times was code accepted into the code base? that's a pretty easy proxy, right? If the percentage point moves by statistically significant amount, it's okay, we should probably look into maybe what we did wrong here. Or, maybe we did do something right. And we should roll this out to more users. So the experimentation online experimentation. System that we have. We've been investing in for the last two years. This is the same with autocomplete for autocomplete. We were hovering around, somewhere in the thirties to forties percent of acceptance, right? this number obviously improved over time as our models got better, we would introduce, AST parsing and see what the impact of that was. And that's our North star. So with the agent, we look at the percentage of code that it's writing for the user. We look at the acceptance rate. We look at. when you do thumbs up and thumbs down, that gives us some feedback. Obviously it's a smaller sample of users that are willing to do that. I would
Alex Volkov
Alex Volkov 1:31:04
like to pause here and say for folks who are listening,
1:31:07
use the thumbs up, thumbs down. it helps you get a better product at the end. obviously when you're building an AI application yourself, you should collect user feedback, but it's very important. So if your Windsor users are listening to us. Give thumbs up, thumbs down. You'll get a better product at the end. This is like actually helpful for you.
Kevin Hou
Kevin Hou 1:31:23
Yeah, I
Alex Volkov
Alex Volkov 1:31:24
appreciate
Kevin Hou
Kevin Hou 1:31:24
that.
1:31:24
Yeah. Yeah. a classic example is like when we released three, seven, if you thumbs down, like this is helpful information for us to know, this is not a scalable thing and the company is small, right? We're a team of 40 engineers or so, doing all sorts of things, but we're a team of 40 engineers. We are still pretty active on Twitter. I think having qualitative feedback is. Super important. Super helpful. People often talk about vibe coding and like the vibes of the LLM, right? We're talking about Claude. It's like one of the vibes of Claude 37. that is our vibes. It's like how many people are excited about the features? Where are they noticing gaps in performance? I get a lot of emails. I get a lot of Twitter DMs of failure cases and having a pulse on the qualitative side is very helpful. For example, we now have identified that Sonnet is pretty eager at calling tools in production. And that means 3. 7,
Alex Volkov
Alex Volkov 1:32:20
the latest one.
1:32:21
3.
Kevin Hou
Kevin Hou 1:32:21
7, yeah.
1:32:22
And that's different from 3. 5. And our users have been able to tell us this, and we've experienced this slightly internally. but it's helpful for us to know and get that feedback. Because we, it's impossible for us to track every single metric, right? From our perspective let's just say 37 is doing its thing. The quality is quite good. We're getting a high acceptance rate. People are very happy with the outcome. But with the caveat, Oh, it like my call some more tools than three, five. And that's something that we could look up, but it's not like we're looking for that. So it's helpful to know from users and then we can go and actually dig into the data.
Alex Volkov
Alex Volkov 1:32:56
one thing that, I definitely want to chat with you, on
1:32:59
record, but we've talked about this, you and I, when we met is vibe coding. Karpathy has, Andre Karpathy, everybody's beloved, I don't know, AI senpai. has this knack of coming up with something and then he exchanges the industry all the way to Rick Rubin posting something. so vibe coding is something he came up with, where basically he just looks at code, talks to code using Super Whisper or something like this. And just basically doesn't even write the code itself. Like talks to it, sees it, accepts it, goes on and improves this. how are you guys thinking about this like vibe code? I don't know, swell, ground swell that's happening. Like I've seen people accept this super weird. I've seen stickers in the I engineer. I think you guys can do some of the stickers as well. what's your take on this vibe coding thing is a software engineer who like, you just said you learned go when you joined this company. Now you're writing a go. what's your take on vibe coding?
Kevin Hou
Kevin Hou 1:33:48
I'm conflicted.
1:33:49
I think like I find coding to be very fun in the sense that This is gonna sound so geeky, but like when I build a nicely abstracted class that then gets, instantiated and used properly and the autocomplete is like picking up all the different APIs and methods like that's like a dopamine hit to me and that's like why, like there are certain things like that I just like love about coding, that is like a little bit removed from the magic of vibe coding. So I'll just say that my personal levers of Just building software and like literally typing get taken away when you vibe code that being said, there are certain things that I've done over and over that have I've graduated to feeling okay, I just want to get this done. I know exactly how it's going to work. There are so many ideas in my head that are like, I know exactly what I want and I would know how to implement it. It's just there's this baggage of actually needing to implement, to type, to create the new files, all this sort of thing. And this is where like the agent side and the vibe coding side of things really comes into play. the, so I guess my answer to this question is it's, there's an in between state that I really vibe or like gel with, which is, the scaffolding of what you want. And I almost model it as okay, I'm about to make a commit. And in that commit, I know the scope of that commit and I can scope it appropriately. Let's use, let's like vibe code and purely use the agent to accomplish this sort of commit. Like I feel if I'm making a new LRU cache, for example, like that is something that can easily be done with vibe coding and like less typing in the mix. But I do have an intuition over where the limit of that goes, right? If something gets, starts getting complicated or there are bugs that I know it's going to probably end up in this like ruthless cycle of debugging. I will take a step back and like actually dive into the code and then switch from right mode of the agent to chat mode and start asking a question. Hey, Let me plan out a bit more and let me follow the agent's guide and don't have it be completely hands off the wheel. I don't know if I'm having, I don't know if I articulated that well, but there's some sort of in between that I personally really enjoy.
Alex Volkov
Alex Volkov 1:36:04
I hear you.
1:36:04
I have, I have a few follow up questions. Listen, go ahead. And then we'll have two more, Kevin, and we'll let you go. be mindful of your time.
Nisten
Nisten 1:36:11
Yeah.
1:36:12
I felt the same way because just that I also found myself turning a bit more into a boomer or like how they have treated me in past jobs where they were like, no, I'm setting the back end. You can make the calls. You can't screw up. You just make the UI. And now I just feel myself treating the, the AI the same way as I just make the things. I know the encryption work. I know the calls are fine. Not that you can't really screw it up. Just add clerk dev off of whatever in there. so there's that. But, again, I do feel the same way that when you have a really nice monolith that, that just works and you write like a nice abstract class and a nice single day, you can't really screw it up anymore. That's just. Yeah, this is beautiful. so now the issue with the elements is that they tend to ramble and just make a lot more files. And I want to know what's been just your overall feeling where you see things going. In that way as to how that problem can be improved. Is it just a training thing that The elements don't really know how to take a very big code base and make it into something nice and small or Is it a is it just they're not smart enough yet, are we yet to see some kind of more of a breakthrough technology like with the Asts that might be able to to fix that. It's an open ended, question. I'm just wondering because I do see this as a clear problem. they just keep making more files. They just keep adding more crap. And then I realized that I could just redo this manually in a pre act app and it'll do the exact same thing. Probably, it'd probably be nicer.
Kevin Hou
Kevin Hou 1:37:49
so yeah.
Nisten
Nisten 1:37:51
how do you feel things are going in that direction?
1:37:55
And what do you think might be the future solutions? they don't have to necessarily be from Windsurf. You could just guess.
Kevin Hou
Kevin Hou 1:38:01
Yeah, completely.
1:38:02
when I speak, it's more as an industry in general. no, that's a very apt observation, I think. So I noticed chalk this up to like planner and research getting better. Obviously there's some like infrastructure things that we can do on our side. There's always prompting, right? But I think ultimately the failure is the model deciding that it needs to do too much from the get go, like the actual planning out of the execution of your task, and maybe not scoping that properly. And that could be a failure of Retrieval, but it could also be just the tendency of the models. planning is the frontier of what we're seeing out of the foundational labs, and that's something that I anticipate to be the area where they're spending a lot of time and that's going to improve. So I'm not going to say like we're doing nothing to try and solve these things, but I just anticipate that the quality of models like as intelligence gets better these. This is one of these things that will just decrease over time.
Alex Volkov
Alex Volkov 1:39:02
Kevin, I have a follow up for you.
1:39:04
Somebody who works in making software developers significantly more performant, the next engineer or everyone. There is this tendency of folks to ask me, as somebody who follows AI closely, whether it's Worth it for them to even go and study code. And I have an answer that I give, but I would love to hear from you. If somebody who like builds a coder, how do you see the future of software engineering in general, like AI based engineering, you guys are obviously working on tools to help developers. how does this affect senior developers? How does this affect junior ones who want to get into the code? It was just like to hear you like. yeah, in, in, in our lives going forward.
Kevin Hou
Kevin Hou 1:39:43
Yeah, there's, so I still, maybe it's like the boomer thing we're
1:39:49
just talking about, I still think there's value in learning to code. maybe this is me being locked in like an area, like I took, I was in school at a time when chat GPT didn't, was not invented. but even now I notice it, it's I cannot be. Our code base is complex and like projects get very complex. We forked VS code. So imagine you just took that repo and you have to understand that in addition to a three year old mono repo that, at this point, like 40 plus people have been contributing to, these are very complicated, both concepts. And we deal with very hard infrastructure, hard problems, hard product problems. and not actually understanding. The inner workings of the code puts you in like a tough spot and on a smaller scale, I think I still get this gut reaction when, if I'm vibe coding for too long, for example, if I'm building an app and I don't actually know what the inner part, like the actual code looks like, I don't know what the Singleton is. I don't know if they use the class or if they're using, functional components, like what, I don't know what's going on inside the hood. That makes me very uncomfortable. And when I run into a bug. That puts me in a tough spot because I actually don't know the foundation on which the project is built. And I still think that needing the models are not good enough yet to know that, to abstract away completely the foundation for your code. So that's my answer to the learning part. I guess the second part is what do you think the future of software development looks like? We get this question a lot. I normally think about, I don't think about the post AGI world. I think there is. That is a concept that people talk about, post economy, all this sort of thing. that's great. If that happens, we're all in a weird spot. So I wouldn't send too many brain cells thinking about that. I, but I think about, all right, now we've given Windsurf to, I don't know, a very large financial company and everyone's using Windsurf. Everyone's using agents to do their code. what does that future look like? I think the bar for what software engineers have to create becomes much higher. Think about instead of dealing with syntax or instead of dealing with, for example, my ramp up to use go, that was like a couple of months because I was using autocomplete. I was using chat. You can learn faster. You can execute faster. You can look things up faster. you're the bar of what becomes expected of a single software engineer becomes higher. So the user experience of your product. I actually maybe this is a hot take. I think it's ridiculous that we have projects that come into the world. don't look at least some reasonable level of U. I polish. And if you said this two years ago, you'd be like, Oh, you're wasting time because the MVP should be the MVP for the features. But like agents and AI have made it so easy to implement V0 or material UI or whatever tailwind to enforce a basic bar for quality there. And you can think about that on that access. You can think about it on the unit testing access. There's so many different areas that, traditionally you might say, Oh, we're going to shortcut or we're just not going to spend as much time there. But over time, these are going to become necessities, guarantees, not just because they're helpful. But it just because you have now these tools at your disposal that generate unit tests for you entirely. There's no excuse for you not to have that. So I just think the bar of quality of output is going to increase.
Alex Volkov
Alex Volkov 1:43:05
So I think, on following up on what you just said, and maybe
1:43:09
as the last question here, before we conclude this interview, so first of all, thank you so much for your time. the follow up is. And maybe an opportunity for you to go like this a little bit, I've seen on my timeline, tons of folks switching from, Cursor to Windsurf and, like maybe Cursor is the elephant in the room here. I tried to do this interview without mentioning this once and it looks like I've succeeded. Tell me why you think that is. That would be very interesting. I've seen multiple folks switching and saying, this is actually way better. and I'm thinking about this. Okay. there's the chat, there's the agent mode. There's maybe the way you guys are implementing. We'd love to hear from you. how you differentiate in this field that's fairly increasingly involved in there's the models that everybody use, but you're a wrapper on top of them. You have your own stuff, your own models, how you differentiate, what's the role of UX and DX in this, user's choice. And why do you think it is that my timeline gets more and more people saying, Hey, I switched from cursor into windsurf. I'm happy and everybody should also do this.
Kevin Hou
Kevin Hou 1:44:04
Yeah.
1:44:04
Yeah. Yeah. You saved the juiciest one for last, there's a lot, there's a lot of things here, I guess a meta point here would be, and this is one of the kind of hypotheses that me and some of the other people on the team are trying to test in this year, this last two years of building Codium, the extension going up against co pilot going up against like various competitors from large companies. this has been a hypothesis that we've been testing over time, which is, does the best product win? There's so many aspects of what makes a product popular, whether it be. Branding or, I would consider user experience part of the product itself, but like branding and hype and, you could argue like influencers play into this, you could argue distribution is like the most important thing, the time at which, location, oh, there's so many things that play into what makes a product widely used and talked about. and I think for us, there's always this, we want to build the best product and we want the best product to speak for itself. And as engineers, as people that have dedicated, a lot of time to this problem, we really want to believe that building the best thing will just win the hearts and minds of developers. but that's not always the case. And I think you're seeing this over the last year. I would say the product has not changed drastically between November 13th and now, obviously it's changed quite a bit and we've had a lot of, we've shipped a lot, And we made a lot of improvements that people enjoy. But it's not like we have completely done anything super different than what we launched with that has caused a lot of love. I think it's just over time. People have started to learn that there are particular strengths that windsurf has that, maybe the competition doesn't, or maybe that they just gravitate towards more. Things like user experience that you mentioned. I think the agentic first approach, has been helping a lot. and it could also just be the case that, it was a good product before still continues to be a good product will continue to be a good product. People find out about it, and it's a shiny toy. It's like different. It's new. It's solves your work. And so there's this kind of meta point of just what makes a product special and interesting that I've been learning about over the last. month or so since windsurf has gone quote viral, right? Opening up my timeline two months ago would be like, okay, there's some people talking about windsurf, but it's still a smaller thing. And now I'm like, Oh, wow. Like people really are gravitating towards this product. Why? obviously, like we always believe that we are building the best products, but like, why now? and I don't have a super clean answer. It could be a bunch of small wins. It could be doing podcasts. It could be doing, some more traditional marketing. It could be that, I had a tweet that went viral about, posting chat versus agent, right? These little things might add to the sort of thing. And it takes a couple of loud voices to get the ball rolling. But I think above all, we want to build the best product and we're glad that people are starting to see that. So I guess the other part of your question is on what axes are we battling on and where do we think that Windsor strengths lie user experience is definitely a big part of it. I think it comes from a team that is, we hire design engineers at our company. And this has been a term that has become more popular. It's a role that has become more popular. It's really hard to find, but when you find someone who is able to grapple with both the technical complexity of a developer tool and specifically an agent with someone who has design instincts and is able to build. Great UIs and just design and think about great UIs. You get this like really potent combination of someone who is able to take something super complex and turn it into something simple. We often like to think. We would love for this to actually be the case where we can say, Oh, we are like the Apple of ideas, but we approach this where, the agent is very complicated and we want to boil a lot of that complexity away for the user. We want to give you a curated, beautiful, fun to use, easy to use, and most importantly, a premium and powerful experience, to users. And a big part of that is what information do you choose to show? What information, notably do you like hide and abstract away? How clean do you make the user experience? How different does it make? So that's one example, user experience. The other thing that I think what people will notice, and I'm glad people are coming in the comments and seeing this, the truly agentic, like the agent first workflow is not something that it's all marketing speak, whatever. Until you actually try it and I can tweet about it all I want, but the only way that someone is going to actually see this, it's not like there's a different button or a different, sure we have different user experience, but like the way that you will feel this is when you actually try it and start using it in your day to day, like the classic example, and the one that I was telling people at. at the presentation is like you're chatting with the agent. Now, all of a sudden you flow back to writing in the ID. You want to do some tweaks, you want to make some changes. so you're using autocomplete, you're using edits. and then all of a sudden you go back to the agent and then you just. Continue chatting with the agent, but the agent knows what you just did in your editor, and so it's able to continue your work and continue the refactor that you're doing. that is like a beautiful, magical experience that, quite frankly, the other products just don't have. and because we took this agent first approach where everything goes on like a unified timeline for the agent to consume. You end up with a seamless feeling product, and that is one of the secret sauces that makes Windsor feel like it's reading your mind. and connected to that is user experience, right? it's all holistic. if the agent doesn't understand that you made a change and then undoes your change that you just made, that's like a horrible failure mode. And you could say that's like terrible UX. But actually that's like the agent first approach. We will never do that because we understand, we have complete context over what you're writing. And the agent is becoming a huge part of the feeling of the product, right? It's not just UI, it's like the feeling, the vibes of the agent. That really shined through.
Alex Volkov
Alex Volkov 1:50:01
Kevin, I have tons of other questions to ask you, but, I
1:50:03
think we're at the end of our time. I just wanted to shout out the fact that, first of all, I want to highlight this comment. that you, your team is very approachable and responsive on X in a lot of places and awesome DevRel vibes. I felt this as well. And so I want to shout out absolutely, because I think it's very important for folks, who are interacting with the team, giving you feedback, as you said, you guys are collecting feedback from the vibes as well. And yeah, there's just so many edge cases for this way of running with AI, like without tapping into the community, which is what we do here in Thursday. I, it's really hard to understand. thank you so much for coming up and continuing doing this work with us here on Thursday. I, you're considered friend of the pod from this point, feel free to come back when you guys release new things and you want to talk about them. Our community absolutely loves Windsurf. I've seen this, more and more on my timeline as well. So thank you folks who want to try Windsurf, windsurf. ai. Just hit download. I believe you guys are offering a free something and then people can upgrade. If they want more, you already have, CloudSign 3. 5, 3. 7 integrated. so just awesome all around. Thank you so much for coming up a great interview. and we will be posting this on every. as we are nearing the end of the show, folks, I just want to remind you all that if you've missed any part of the conversation with Kevin, We will be live on any, all podcasts, in Apple, Spotify, et cetera. And, one last thing that I always do, Kevin is to let the guests, have a second to shout out any kind of. Appreciations on the team or et cetera. Anything that you want to like, highlight or, promote, this is your like, chance to do So anything you want to tell about your team or folks, this is your time.
Kevin Hou
Kevin Hou 1:51:38
I guess I would just shout out the entire windsurf team.
1:51:41
we're a small, but mighty team based in Mountain View and in Austin. And people are working around the clock to make sure that, you get the latest models, for example, on the day that they get released. That you get the best experience possible. And, we are constantly like, we love our work and it's just a big shout out to everyone who's worked on the windsurf product. I think they'll see this at some point, really seeing the impact that we're having on the dev tool landscape and it's really motivating. And it's a pleasure to work with such a great team. So this is more of a shout out to the internal team. and then I guess also just huge shout out to the users. I see some people that are day ones. In the, in the comments. So that warms my heart. hopefully we can just keep spreading the word. Go download Windsurf.
Alex Volkov
Alex Volkov 1:52:26
Download Windsurf folks.
1:52:27
This is the combination of the conversation. Go download Windsurf, play with it. give feedback, hit the thumbs up, thumbs down. Yeah. Follow me
Kevin Hou
Kevin Hou 1:52:34
on Twitter.
1:52:34
give me feedback. I love it.
Alex Volkov
Alex Volkov 1:52:36
Yeah.
1:52:36
Kevin is, At our Twitter space as well. So definitely give Kevin and the whole windsurf team a follow as well. All right, folks, we're at the end of the show. We're a little bit over. However, today's not over because GPT 4. 5 is about to fucking get released. So we need to end this and start working, but we will be back, with the GPT 4. 5 news, because that's probably going to be insane as well. meanwhile, I want to thank everyone. For joining our live shows as always, everything that we've talked about is going to get edited down into a podcast and newsletter and get posted everywhere where you get your podcasts. Spotify, et cetera, YouTube as well. thank you for everybody who stuck with us though. The technical issues came up. So the space dropped. Thank you so much, Kevin, for joining us and talking about WeSurf and the conversation. Nisten, LDJ, thank you for hopping on and talking about everything, that we've covered, we covered pretty much everything we wanted. Maybe the last thing that we didn't talk is like VO2 API. I'm pretty sure that we'll show those as well. I think the hugest thing. It's going to be GPD 4. 5. If you missed any part of the show, everything is recorded. Feel free to give us comments. We love comments. many of the conversation and questions that we've had with Kevin also came via comments as well. As always, if this is your first time listening to Thursday Night, we're here. And our motto is we stay up to date. So you don't have to. And so hopefully by end of today, you've kept up to date about all the models and, yeah, with this, thank you so much folks. Thank you for joining. Thank you, co host and guests. And thank you most of all the community, Kevin, like Kevin said, the fact that you guys are tuning in, giving us feedback, telling us what's new and what we've missed, because honestly, it's quite impossible to keep up with everything, despite our motto. this, thank you so much. We'll be ending the show for today and then stay tuned because the second part, the unofficial unplanned second part about GPT 4. 5 is going to come as well. Thank you everyone. Bye bye.