Episode Summary

This week's ThursdAI went deep on agent skills โ€” the open standard that's turning general-purpose AI agents into domain experts with nothing more than markdown files and a directory structure. Eleanor Berger from Agentic Ventures joined for a masterclass on skills, while Alex demoed adding skill support to the Chorus app in just 3.5 hours using a Ralph loop. The show also covered Claude Cowork (a week-and-a-half sprint, 100% written by Claude Code), GPT 5.2 Codex hitting the API where Cursor used it to build a full browser from scratch with 330,000 commits, and Google rolling out Gemini personalized intelligence across Gmail, YouTube, and Search.

Hosts & Guests

Alex Volkov
Alex Volkov
Host ยท W&B / CoreWeave
@altryne
Eleanor Berger
Eleanor Berger
Agentic Ventures โ€” Founder
@intellectronica
Ryan Carson
Ryan Carson
AI educator & founder
@ryancarson
Wolfram Ravenwolf
Wolfram Ravenwolf
Weekly co-host, AI model evaluator
@WolframRvnwlf
Nisten Tahiraj
Nisten Tahiraj
AI operator & builder
@nisten
LDJ
LDJ
Nous Research
@ldjconfirmed
Yam Peleg
Yam Peleg
AI builder & founder
@Yampeleg

By The Numbers

Claude-coded Cowork
100%
Claude Cowork was 100% written by Claude Code in a week-and-a-half sprint at Anthropic
Commits
330K
Cursor built a full browser from scratch using GPT 5.2 Codex with ~330,000 commits
Lines of code
3M+
Cursorโ€™s browser experiment: millions of lines of Rust, built by hundreds of concurrent agents
M3 Medical LLM
235B
Byteโ€™s M3 open-source medical model fine-tuned from Qwen3, claims to beat GPT 5.2 on HealthBench
LongCat Flash
560B/27B
Meituanโ€™s LongCat Flash Thinking โ€” 560B total params, 27B active, MIT licensed
OpenAI ร— Cerebras
$10B
OpenAI partnership with Cerebras for 750 megawatts of high-speed compute, starting 2028

๐Ÿ”ฅ Breaking During The Show

Flux Two Klein โ€” Black Forest Labs' Fast Image Model
Wolfram broke the news mid-show: Black Forest Labs dropped Flux Two Klein, a fast 4B/9B image generation model under Apache 2.0 / open weights, designed for near-real-time editing and style iteration.

๐Ÿ“ฐ TL;DR

Alex opens the show with a call to non-developers to dive into AI agents, introduces the panel, and runs through a packed week: open-source medical LLMs, Claude Cowork launch, GPT 5.2 Codex in the API, Gemini personal intelligence, drama between Anthropic and Open Code, and a deep dive into agent skills with guest Eleanor Berger.

  • Agent skills deep dive announced as the main topic
  • Claude Cowork launched for non-technical users
  • GPT 5.2 Codex finally released via API
  • Gemini personal intelligence across Google services
Alex Volkov
Alex Volkov
"Coding agents or coding harnesses for AI that we talk about GPT, Claude et cetera, they are generalized agents. If an agent can write code for you to do tasks, it's generally a generalized agent that can do everything else."

๐Ÿ”“ Open Source AI Models

The panel covers open-source releases: Byteโ€™s M3, a 235B parameter medical LLM fine-tuned from Qwen3 that claims to beat GPT 5.2 on HealthBench, plus Anthropic and OpenAI both pushing into healthcare with HIPAA-ready products. Nisten highlights M3 can run on an M3 Ultra at usable speeds.

  • M3: 235B medical LLM, Apache 2.0, beats GPT 5.2 on HealthBench
  • 22B active parameters โ€” runnable on M3 Ultra
  • Anthropic launches Claude for Healthcare with HIPAA compliance
Nisten Tahiraj
Nisten Tahiraj
"It's 22 B active parameters and 235 B. So you can actually run this on, like if a doc wants to run it, they can actually run it at a usable speed."

๐Ÿ”“ MedGemma

Google releases MedGemma 1.5 for medical use cases while Nisten and Wolfram clarify itโ€™s a completely different model class (4B for imaging) that pairs well with the much larger M3. Also covered: OpenAI acquiring Torch Health and Anthropicโ€™s Claude achieving 92% on Med Agent Bench with Opus 4.5.

  • MedGemma 1.5: small enough for offline medical imaging
  • Opus 4.5 hits 92% on Med Agent Bench
  • OpenAI acquires Torch Health for GPT Health
Nisten Tahiraj
Nisten Tahiraj
"These do not replace each other. You should use these together. This is a very good pair."

๐Ÿ’ฐ Drama Corner & Partnerships

Spicy industry news: Thinking Machines co-founders return to OpenAI, Soumith Chintala becomes their CTO. Anthropic blocks Open Code from using Max subscription as a wrapper and blocks xAI from using Claude Code. Apple announces Gemini will power Siri. OpenAI inks a $10B deal with Cerebras for 2028.

  • Anthropic blocks Open Code and xAI from Claude services
  • Apple partners with Google โ€” Gemini to power Siri
  • OpenAI ร— Cerebras: $10B for 750MW compute (2028)
  • Thinking Machines co-founders return to OpenAI
Ryan Carson
Ryan Carson
"I think it's crazy that Apple doesn't have a large language model. It's just unbelievable. They aren't one of the model labs. Like I think all of us are just kind of throwing our hands up and saying, how did this happen?"
Nisten Tahiraj
Nisten Tahiraj
"There's going to be a lot of spouses and family members of xAI engineers now with the Claude Max subscriptions."

๐Ÿ› ๏ธ Claude Cowork

Anthropic launches Claude Cowork โ€” Claude Code for non-developers, built in a week-and-a-half sprint with 100% of the code written by Claude Code itself. Alex demos it live, adding Flux Klein support to an image extension project without seeing a single line of code. The panel discusses the security implications and the dangerously-skip-permissions debate.

  • 100% coded by Claude Code in a 1.5-week sprint
  • Research preview, Mac-only, requires Max subscription
  • Chrome connector enables browser automation
  • Live demo: added Flux model support without viewing code
Ryan Carson
Ryan Carson
"People are realizing that the model is good enough to complete specified tasks well without micromanaging, like very well. So this is just an extension, a UI on top of that."
Nisten Tahiraj
Nisten Tahiraj
"Can you code? No. Can you use Vim? No. I know what I like and I don't like. I am decisive in what I prompt."

๐Ÿข GPT 5.2 Codex

OpenAI finally releases GPT 5.2 Codex via API after months of exclusivity in the Codex app. Cursor used it to build a complete browser from scratch in Rust with 330,000 commits and hundreds of concurrent agents. LDJ and Ryan debate context compaction โ€” Ryan drops the hot take that compaction doesnโ€™t work and atomic Ralph-style tasks are the real solution.

  • GPT 5.2 Codex now in Cursor, GitHub Copilot, and VS Code
  • Cursor built a browser from scratch: ~3M lines of Rust
  • Native context compaction support for long sessions
  • Ryan's hot take: auto compaction doesn't work
Ryan Carson
Ryan Carson
"I have a hot take on this. I do not think auto compaction works. I think if you use these tools right, you'll find that you can't compact out what you actually need out of a thread."
LDJ
LDJ
"For a lot of medium to somewhat hard tasks that Opus can do, I would say 5.2 Codex can often do them as well, but it tends to take longer, and that seems to be a big downside of it right now."

๐Ÿข Gemini Personal Intelligence

Google ships personalized AI in Gemini, reasoning across Gmail, YouTube, Photos, and Search with explicit opt-in. Alex tests it โ€” it figured out he drives a Tesla Model Y from emails and noticed his recent Honda Odyssey search. The panel discusses Googleโ€™s massive data moat and LDJ predicts MCPs for everything.

  • Gemini reasons across Gmail, YouTube, Photos, Search
  • Explicit opt-in for US Pro and Ultra users
  • Googleโ€™s data moat vs OpenAI and Anthropic
  • LDJ: MCPs for everything, cross-platform personal AI
Wolfram Ravenwolf
Wolfram Ravenwolf
"That's also the reach Google has. At Google, they just make it available. You probably get a popup if you want to enable it and immediately you have millions of users."
LDJ
LDJ
"On the question of where it's all going, I think you'll have essentially MCPs for everything."

๐Ÿค– Agent Skills Deep Dive

Eleanor Berger from Agentic Ventures joins to kick off the skills deep dive. She explains that skills are an admission that we now have general-purpose agents โ€” they do everything except know what you want. Skills are the missing piece: simple markdown files in a directory that give agents domain expertise via progressive disclosure.

  • Skills = admission we have general-purpose agents
  • Simple markdown + directory structure, universally adopted
  • Progressive disclosure: agents load skills on demand
  • Every major coding agent now supports the standard
Eleanor Berger
Eleanor Berger
"Skills are an admission that we now have general purpose agents. They do everything you need except that they don't know what you want to do. And that's what you have skills for."

๐Ÿค– Skills Adoption & Platform Support

Alex walks through the current adoption landscape: Claude is the only chat interface supporting skills, but virtually every coding IDE (Cursor, Windsurf, Anti-Gravity) and CLI (Claude Code, AMP, Open Code, Codex) now supports the standard. Eleanor gives a shout-out to AMP as one of the first adopters.

  • Cursor, Anti-Gravity, and Gemini CLI added support this week
  • AMP was one of the first adopters
  • Skills work cross-platform: same skills, any agent
Alex Volkov
Alex Volkov
"The reason why it's useful is that these LLMs are really, really good generally. At some point though, you need to steer them and give them domain expertise. Domain expertise is where it's at."

๐Ÿค– What is a Skill? Structure Explained

Eleanor walks through the anatomy of a skill: a directory with a skill.md file containing YAML front matter (name + description of when to use it). The magic is that each skill takes only 50โ€“100 tokens of metadata, so you can have hundreds without polluting context. Alex compares it to Neo in The Matrix: the model decides when to load domain knowledge.

  • A skill is a directory with skill.md + optional scripts/references
  • 50โ€“100 tokens per skill metadata โ€” hundreds fit in context
  • Progressive disclosure: agent loads full skill only when needed
  • Skill creator skill: self-reflecting AI that builds skills
Eleanor Berger
Eleanor Berger
"You could have hundreds of them because they take very little, maybe like 50 to hundred tokens per skill, just the metadata. And the agent will figure it out. They'll know when the time comes to grab that skill."
Alex Volkov
Alex Volkov
"This is like Neo in The Matrix when they plug him in and he's like, I know kung fu. This is skills in a nutshell. The model decides to load what information when."

๐Ÿค– Scripts, References & Assets

Eleanor explains the three optional directories in a skill: scripts (Python/TypeScript code for API calls or computations), references (additional markdown for progressive loading), and assets (templates, images, static files). Ryan highlights that experts like Vercel are now releasing skill packs for frameworks like Next.js and React.

  • Scripts: runnable code for APIs, calculations, tools
  • References: additional markdown loaded on demand
  • Vercel releasing official Next.js/React skill packs
Ryan Carson
Ryan Carson
"Experts in the field like Vercel, who obviously know probably the most in the world about React and Next.js, they're starting to release sets of skills now. You point your agent at this and it will install the skills for you."

๐Ÿค– Creating Skills with AI

Eleanor reveals the key insight: you donโ€™t have to manually create skills โ€” agents are really good at building them. She argues this solves continual learning: teach by doing, then tell the agent to package what you just did as a reusable skill. Alex explains that Claudeโ€™s chat interface supports skills directly for Max subscribers.

  • Agents can create skills from your workflows
  • "Continual learning? It's solved. The problem is solved."
  • Teach by doing: work with the agent, then package as skill
  • Claude web/Mac chat supports skills for Max subscribers
Eleanor Berger
Eleanor Berger
"Agents are really good at creating agent skills. You bring the knowledge, you just say, here's what I know about this workflow, or this library. And the agent will very handily create the skill for you."
Eleanor Berger
Eleanor Berger
"How will we solve continual learning? It's solved. The problem is solved."

๐Ÿค– Practical Examples & Use Cases

Eleanor shares her skills portfolio: flashcard apps turned into skills, image generation via Nano Banana, MCP replacements, and driving multiple models from Claude. Wolfram describes his to-do list manager skill and screenshot-based workflows. Eleanor drops the key insight: skills are the joker card of customization โ€” they replace commands, hooks, MCPs, and even small apps.

  • Eleanor replaced a full app with a 10-minute skill
  • Skills can replace MCP servers, hooks, and commands
  • Wolfram: to-do list manager built entirely as a skill
  • Skills are portable between different agents and models
Eleanor Berger
Eleanor Berger
"They're like the joker card of customization because they replace everything. You don't need commands anymore, you don't necessarily need hooks, you don't need MCP servers necessarily. Skills are all you need."
Eleanor Berger
Eleanor Berger
"Skills are portable between different agents and different models. And are forwards compatible. No model in the future will be worse than the models we have now at interpreting the instructions."

๐Ÿ› ๏ธ Demo: Adding Skills to Chorus

Alex reveals his big project: he used a Ralph loop with Claude Code to add full skill support to Chorus, an open-source app that compares answers across multiple LLMs. In 3.5 hours, Claude built a settings panel, skill discovery from the filesystem, front-matter extraction, and cross-model skill injection โ€” making skills work with GPT 5.2 Codex, Gemini, and every Open Router model.

  • 3.5 hours via Ralph loop to add full skill support
  • Skills now work across any LLM via Chorus + Open Router
  • Settings UI, filesystem discovery, and front-matter parsing
  • GPT 5.2 Codex using Claude-style skills for the first time
Alex Volkov
Alex Volkov
"I've added skill support to Chorus, and now you can use skills, the same skills that you have already installed, with Chorus on every LLM out there. GPT 5.2 Codex, the one that was released yesterday, you can now use it with your own skills in a chat interface."

๐Ÿค– Future of Skills: Marketplaces & Sharing

Ryan asks if weโ€™re heading toward a skill marketplace โ€” he already spent $200 on skills from The Boring Marketer. Alex predicts a mix: companies turning docs into skills, free community-shared skill packs via Git, and paid specialist collections. Ryan closes by telling Alex to sell his podcast production skills.

  • Ryan spent $200 on marketing skills pack โ€” worth it
  • Skills shareable via Git, local per project or global per user
  • Skill marketplaces coming alongside free community sharing
  • WeaveHacks 3 announced: Jan 31โ€“Feb 1, Self-Improving Agents
Ryan Carson
Ryan Carson
"Alex, I'm gonna give you an idea that I think will make you a lot of money. I think you're very good at preparing for podcasts. I think you should make some skills and we will all buy them."
Alex Volkov
Alex Volkov
"I was able to use the skills that I learned on the show last week to develop this thing that I think if I were at a startup doing traditional software development, this would've taken a week and a half. This just happened in like three hours."
TL;DR
  • Hosts and Guests

  • Open Source LLMs

    • Z.ai GLM-OCR: 0.9B parameter model achieves #1 ranking on OmniDocBench V1.5 for document understanding (X, HF, Announcement)

    • Alibaba Qwen3-Coder-Next, an 80B MoE coding agent model with just 3B active params that scores 70%+ on SWE-Bench Verified (X, Blog, HF)

    • Intern-S1-Pro: a 1 trillion parameter open-source MoE SOTA scientific reasoning across chemistry, biology, materials, and earth sciences (X, HF, Arxiv, Announcement)

    • StepFun Step 3.5 Flash: 196B sparse MoE model with only 11B active parameters, achieving frontier reasoning at 100-350 tok/s (X, HF)

  • Agentic AI segment

  • Big CO LLMs + APIs

    • OpenAI launches Codex App: A dedicated command center for managing multiple AI coding agents in parallel (X, Announcement)

    • OpenAI launches Frontier, an enterprise platform to build, deploy, and manage AI agents as โ€˜AI coworkersโ€™ (X, Blog)

    • Anthropic launches Claude Opus 4.6 with state-of-the-art agentic coding, 1M token context, and agent teams for parallel autonomous work (X, Blog)

    • OpenAI releases GPT-5.3-Codex with record-breaking coding benchmarks and mid-task steerability (X)

  • This weeks Buzz - Weights & Biases update

    • Links to the gallery of our hackathon winners (Gallery)

  • Vision & Video

    • xAI launches Grok Imagine 1.0 with 10-second 720p video generation, native audio, and API that tops Artificial Analysis benchmarks (X, Announcement, Benchmark)

    • Kling 3.0 launches as all-in-one AI video creation engine with native multimodal generation, multi-shot sequences, and built-in audio (X, Announcement)

  • Voice & Audio

    • Mistral AI launches Voxtral Transcribe 2 with state-of-the-art speech-to-text, sub-200ms latency, and open weights under Apache 2.0 (X, Blog, Announcement, Demo)

    • ACE-Step 1.5: Open-source AI music generator runs full songs in under 10 seconds on consumer GPUs with MIT license (X, GitHub, HF, Blog, GitHub)

    • OpenBMB releases MiniCPM-o 4.5 - the first open-source full-duplex omni-modal LLM that can see, listen, and speak simultaneously (X, HF, Blog)

  • AI Art & Diffusion & 3D

    • LingBot-World: Open-source world model from Ant Group generates 10-minute playable environments at 16fps, challenging Google Genie 3 (X, HF)

Alex Volkov
Alex Volkov 0:30
All right, welcome everyone to ThursdAI for January 15th.
0:34
My name is Alex Volkov. I'm in the AI evangelists with Weights, & Biases from CoreWeave, and I am super excited for Todays' show. If it's not clear yet, I'm super excited for the day show because we have, not only do we have tons to cover today, we're going to do a deep dive into agent skills. We reported back on agent skills. Since Anthropic released them. Since then, they've been adopted as a open standard across multiple tools and multiple ideas. and I think that kind of like MCP though, they're completely different. agent skills is something that the world is still snipping on. Many don't even want to try them to try them without coding tools used to require a subscription to Claude code or Claude And, today, we will not only dive deep into them, we're having a guest. Berger is gonna join us to talk about them. I'll also share how you can use agent skills without, paying Claude. But all of that later, I'm very excited about the show if you're not clear. I'm also very excited to add my co-host for today. I have Nisten and Wolfram. I will just dive right in and ask you guys what is the biggest highlight of the news for you this week? We'll start with maybe Wolfram and then Nisten.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:49
Yeah, those skills in anti-gravity that they build it
1:52
in, that, that, yeah, it's great. That's the best thing about the open standards when they proliferate, when they are used by others and, it really is a standard and can be used everywhere. So I'm using my Claude skills now in anti-gravity and it works, so I'm super happy about that. That's dope.
Alex Volkov
Alex Volkov 2:08
Nisten, how are you?
Nisten Tahiraj
Nisten Tahiraj 2:10
I'm on data janitor duty, so, what I liked the most was
2:15
seeing both Crush and Open Code, just end up with an OpenAI deal to use, GPT 5.2 Codesx.
Alex Volkov
Alex Volkov 2:22
Yeah.
Nisten Tahiraj
Nisten Tahiraj 2:23
Because 5.2 just released and they had day one support for both
2:28
the open source, coding harnesses. So that's a very interesting, development. But unfortunately, I tried it in open code, but I didn't do any work with 'em. So I want to give it a much more thorough test, for both.
Alex Volkov
Alex Volkov 2:41
I wanna say, for, a bunch of folks on the podcast listening, tuning in.
2:45
Not everybody who listened to us is an AI developer, and I think that is very important for us to know as well. A lot of the early adoption of the AI insanity wave has been for developers. A lot of the early strong benefits to the economy came through developers adopting ai. And so we often talk about things like cursor and Claude code and crush, and open code, et cetera. We often talk about them as though everybody who listens to the show knows exactly what we need. However, some folks are not developers and for them Anthropic released a tool this week that is specifically for them to, experience the same kind of excitement. Everybody who's not a developer, they can still use a lot of these tools just by literally walking through, your favorite AI agent, have them walk through how to get it started. the reason to use these tools is because I think it's clear this week more than any other week that I've seen that coding agents or coding harnesses for AI that we talk about GPT, Claude et cetera, they are generalized agents. if an agent can write code for you to do tasks. It's generally a generalized agent that can do everything else. It can do taxes, it can do cleaning of your desktop, it can do a bunch of stuff. And I think it's never been, more clear than this week. And so I would encourage everybody who listens to the show, who does not consider themselves a developer for whom a command line is this, nah, I'm not gonna open this type of thing. To open their minds a bit to two things. One, we now in this beautiful, beautiful world where you can learn anything and you can do anything just by having an agent facing screenshots into it. And having, walkthrough two, all of the developers. Now, most of us, what we do is we don't write code anymore. we use natural language to achieve the task that we want. this shift, makes non-developers, more easily accessible to this world. And so most of what we do right now, or at least a lot of us, is babysitting agents. So I would encourage non developers to not be shy and just dive in because the benefits on the other side are wonderful. so this is my little spiel for this week, and I think, with this we'll dive into the TLDR because there's a lot to talk about. There's a big show ahead of us. And as a reminder, Eleanor Berger will join us, later to, talk about AI agent skills,
5:28
All righty. this is the TLDR of everything that happened this week, and this week has been a very busy one. So we'll start with, ThursdAI January 15, to today with you, Alex Volkov, ai evangels with Weights, & Biases. We have Nisten Hir, and we have Yam Peleg, and Wolfram. Raven was joining us and we'll see about other ones. And our guest today was Eleanor Berger She's gonna join us a little bit later to talk about agent skill. About agent skills. we're gonna start with open source, open source LMS by one M three. It's a medical LM by one released M 3 230 5 billion medical fine tuned from Qwen three. And they claim it beats GBD 5.2 on healthcare benchmarks. This one is Apache two, and the tops OpenAIs health bench with 65.1 score. We also have a similar named company that's a little bit different. it's called Meituan, I believe, from they Maan released long cat Flash thinking. agent MOEA, they have a fully MIT licensed model, for gentech tasks scoring 88.2 average on, tau bench, T two bench, and 73 on browser comp. So also super, super cool. in open source. not a huge week for open source, but definitely something deeps seek released a paper that we're not gonna go into because it's, super technical and I'm not sure how useful this is. but deeps seek often releases like super, super cool technical things, that we will cover, but they're not super useful. So I decided, this week has been a big week, so we're gonna skip this in the big companies lms. we will celebrate the release of Claude codes for the masses, Claude Cowork, Anthropic Launch Claude Cowork, which puts Claude codes agenda capabilities into the desktop app for non codes. Also on the web, you give Claude access to a folder and it can read, write, and create files and write code for you. And you don't even have to know what code is, which is great. We're gonna do a deep dive on the show for this. there's a Chrome plugin to, to use the browser. We're gonna mention this at length. So if you are interested and if you never try to Claude code because it's too complex differently, listen to that segment of the show. Also, we have GPT 5.2 Codex, which is the coding version of GPT 5.2 from OpenAI finally released via open API. before this, it was only available in the Codex app itself. Codex is complex. It's an app, it's a model. Nevermind. So GPT 5.2. Codex, was finally available in the API. the main users are. Agent harnesses like Cursor and, cursor specifically used it to do some incredible stuff. We're gonna talk about it's priced the same as GPD 5.2 1.75, per million tokens. we're gonna get to talk about GPD 5.2 because it is great for long running context. All right, what else do we have? we're gonna mention that Cursor used GBT 5.2 Codex to create a whole browser from scratch with over 3 million lines of code. so this is an example of GPT 5.2 Google has shipped personal intelligence in Gemini, which means it reasons across your Gmail, YouTube photos, and search with explicit opt-in controls. in the US and for pro and ultra users. And I've used it, it is really, really funny when it says, Hey, based on your recent Google searches, it looks like you drive this model of a car. But hey, you look for this one. Should I look for some other stuff for you? It's kind of like, you know, I don't use Google a lot for searches lately. having it know what I searched, it's kind of like it gave me pause, but it is kind of cool. it's dope. we're gonna mention this. A personal AI system is very important as well. So we're gonna run through the drama corner super quick. We don't usually do this, this week. I just had to make sure that you know, so you're not left behind on the drama that's happening. So there's a whole thing with Open Code, which is an agent harness in the open source. Anthropic kicked off their API, not really their API, but their login system, blah, blah, blah. apple announced that Gemini will power Siri and not Chat. GPT, with Apple Intelligence started running on device in private cloud. OpenAI announced a $10 billion cereus partnerships for 750 megawatts of compute, but that's only starting 2028 and thinking machines, multiple co-founders and the CTO, returned to OpenAI after being fired for onward conduct or something, And Soumith Chinthala is now the CTO of thinking machines, and I'm pretty sure the elfs have left for valor. is is a reference to that. If anybody knows what I'm talking about. You too, online. Get off, go touch some grass. I think that this is all of the gossipy and kind of like, news for this week. Oh, there's also gossip about, Grok 4.2 coming out very soon and powering the US government, All right, moving on. This week's buzz, a corner where I cover everything happens in the world of. Weights & Biases / Coreweave, Weavehacks 3 three our main hackathon that we organize ourselves, and we have a bunch of sponsors to help us. It is happening. This is the first announcement. You guys are the first to hear this. We only put up the page yesterday. I'm inviting judges. It's happening in San Francisco, January 31st and February 1st. You can sign up at luma.com/weavehacks3 Sign up right now because, places are running out. The prizes are insane. I'm gonna be there to MCP. All right, moving on. Vision and video VO 3.1. Updated, with, vertical videos 9 2 16 vertical videos and 4K upscaling output for better consistency and improved background details. you can use, reference images. Pretty cool. We also have a super viral video moment. Somebody posted a CL motion transfer clip. We mentioned cling before. It allows you to take your motion and put another face on top of it. somebody posted a clip with all the characters from Stranger Things and it kind of blocked the internet. So we're gonna show you this video and talk to you about how to make this also in voice and audio because you kind have to marry those two together. qt I, the folks who did mochi before released pocket TTS, it's a very tiny a hundred million parameter open source TTS that runs in CPU can run in your browser. It runs six x real time on the MacBook Air. Six x realtime on the MacBook cap. It was super dope. one last thing I think in the AI art in diffusion, Z AI released GLM image, a hybrid, auto aggressive diffusion model. It's not that great, but it's great that it's open sourced. And last but not least, a deep dive into agent skills. Agent skills are ways to give detailed custom knowledge to your LMS via composable, repeatable, snippets of text, maybe some code as well. We're gonna deep dive with Eleanor Berger, into skill packs, how to use them, why you use them. And I built something super cool using the Ralph Method from last week and skills from this week. I can't wait to share with you because it's like super, super awesome. So this is our show for today, super quick, DLDR. I will bring back my co-hosts here and see if I missed anything big, And that's it. Let's go into Open source.
12:28
Open Source ai. Let's get it started. All righty, let's get it started. I think we only have the, kind of the, the two-ish models. I wanted to talk about deep seek, that came back with the paper and the training thing, but didn't really see the point, because we want to make the show useful for you with stuff that you can use. So, by releases, M three, an open source 235B billion parameter medical LLM, the day claim surpassed the GPT 5.2 on key health benchmarks like health bench total and health bench hard and hallucination rate is lower than others. That's pretty cool. this comes on the heels of OpenAI, releasing, OpenAI for Healthcare. And Tropic this week also mentioned that they're releasing a bunch of connectors and topical care, healthcare and HIPAA compliance, et cetera. And now we have this like open source, Chinese open source AI that, fine tunes qu, right? Listen, this is a qu.
Nisten Tahiraj
Nisten Tahiraj 13:25
yeah.
13:25
It's 22 B active parameters and 2 35 B. So you can actually run this on, like if a dock wants to run it, they can actually run it at a usable speed. they would have to get an M three, ultra, machine, and then that would fit in there. but the model that they fine tuned it from the QU 3M OE, had very good medical performance, but also very good medical performance in multilingual mode. So I actually used that one quite a bit to do data set generation and data processing for LLM conversations. if you're going to make a medical Finetune or a medical model, this is an excellent choice to start off from now, I haven't seen how their fine tuning ends up being in multilingual mode because again, that's pretty important healthcare wise.
Alex Volkov
Alex Volkov 14:17
Yeah.
14:18
I will say, just out of comments, we're also seeing, but, Google released, met Gemma this week as well, and some folks are asking whether or not this beats met Gemma. I don't think that they are fighting in the same category. met Gemma at 1.5, Google says it's small enough to run offline, improves performance on 3D Imaging and Med a sr. and yeah, they have a speech to text model for medical dictation for having birth attack.
Nisten Tahiraj
Nisten Tahiraj 14:41
Completely different type of model.
14:44
That's A four B. This is a 2 35 B. Yeah. but now that people mention it, it is an excellent pair to put MedGemma for the speech recognition. And then you can use this, this model for, for the actual test. I highly doubt there's like no chance Met Gemma is anywhere close to this one. Yeah.
Wolfram Ravenwolf
Wolfram Ravenwolf 15:07
So it's also the 27 B Met Gemma Orion, which is more like
15:11
a multimodal medical knowledge and reasoning model, not just for the, image interpretation that the four B is for. There are different leaks of 27 B compared to how big is the other one. It's huge.
Nisten Tahiraj
Nisten Tahiraj 15:24
They optimized it for stuff like, you know, if you take like an
15:26
X-ray or CT scan, it's just like a whole bunch of crisscrossing pictures, of you. And this has to be able to ingest all those pictures at once and then make a judgment for all of them, at once. So these do not replace each other. you should use these together.
Alex Volkov
Alex Volkov 15:44
Yeah.
Nisten Tahiraj
Nisten Tahiraj 15:45
this is a very good pair.
15:46
yep.
Alex Volkov
Alex Volkov 15:46
should we speak a little bit about, Anthropic healthcare efforts
15:49
while we're on this topic of medicine? I think that it's fairly clear. I also saw some panel of US medical professionals that say, Hey, you know, medical AI is coming whether you want it or not, it's coming very, very soon. but big Labs did have, a kind of an announcement. So OpenAI acquired torch. torch is a healthcare start of the Unifies lab results, medications, and visit recordings. torch Health. The Torch team is joining Open Air to help build GBT Health into what happens there. so that's one. Two. We have Anthropic Healthcare, Claude for healthcare and life sciences. They added a bunch of connections to the connectors to, different medical providers. complimentary set of tools and resources, allow healthcare providers, payers and consumers use Claude for medical purposes through HIPAA ready products. H-I-P-A-A of course is the standard, at least in the us about communication with medical, sensitive information. It's very strict standard and, new capabilities for life sciences cloud to more scientific platforms, how we provide greater support in clinical trial management, regulatory operations. And they have, examples of their own. models here. So sound 2.5 all the way to Opus 4.5 and Opus 4.5 on this medical benchmark performance, which is Metcal bench and Met Agent Bench is very much improved. 92% on Med Agent bench. so we know the Opus is great. we know the Claude is generally great, but now with the connectors in the HIPAA compliance, looks like Claude is stepping into healthcare also. So it looks like not only on open, open source, we also have the big companies kind of like stepping in in a big, big way, which is great. Great to see.
Nisten Tahiraj
Nisten Tahiraj 17:29
there was a Dr. Ralph last week, so they're trying
17:33
to make Dr. Ralph official now. Nice. I know doctors are using it. It is excellent for, you know, you just give Claude code a job and it's gonna do a whole bunch of research for you, But obviously that was not, applicable in a hospital setting. So, now it is, which is just great.
Alex Volkov
Alex Volkov 17:48
It's absolutely great.
17:49
We've been waiting for AI to take care. Of our health and improve it. we are still waiting. you know, so we'll see how that develops. let's continue to open source super quick. We have one more thing to cover and that is Maan, Maan, I guess, how do we Long Cat Flash Thinking. 26 0 1 Wolfram. any comments on that one? Have you been able to take a look Let's see.
Wolfram Ravenwolf
Wolfram Ravenwolf 18:09
Yes.
18:10
So long head flash thinking 2 6 0 1 is, MOE with 560 billion total parameters and 27 billion active parameters. So it's an MOE, extremely efficient, released under the MIT license. Yay. Always great to have really open source. Open weights models. And, it achieves a hundred percent of the ME 25 and 86.8% of the IMO answer bench benchmarks or outperforms Claude opus 4.5 thinking even, and GBT 5.2 thinking Xi, at, specific tasks. So, yeah, you have to try it, with your own use cases to see how it really does, but, it's, trained for tool use. Of course. I think all the models are now genic models that is very important and, it supports various custom adaptations. Sgl, LL, and VLM support. There's a live demo at Long Cat Chat chat. Where you can talk about the deep thinking and, it's great to have another contender here. I'll definitely do some benchmarks with it and report more, but for now, this is what we have.
Alex Volkov
Alex Volkov 19:17
Okay.
19:18
I think this is an open source. We do have a breaking news wolf. why don't you cover breaking news while we're at it? I think it's, it, it's more in the AI arts, area, but let's definitely cover this super quick so we can, we can continue. Let's do breaking news. AI breaking news coming at you only on Thursday. I.
19:48
All righty, Wolfram.
Wolfram Ravenwolf
Wolfram Ravenwolf 19:50
So my, German colleagues from Black Forest Labs,
19:53
they released a new flux model Flux two Kline, which means small in German. So it's a very small, fast and beautiful model. the main feature that it is very fast, so it can be used for fast editing, changing styles, developing ideas. Alex you showed that website where you can combine concepts where you use a model. For instance, that would be a use case where it would be fitting because it's so fast, it can generate in real time almost. And we will see it's available as a local models or you can download it, and use it. Apache two license. Excellent. And a nine B open weights model.
Alex Volkov
Alex Volkov 20:29
four B and nine B. So four B is Apache two,
20:33
and nine B is the open weight. So like, not Apache two, but you can still use it for stuff. yeah. Very interesting. flax, obviously from break, forest Labs, folks who worked on stability and the help make that thing happen, have been kind of behind after done on Banana Pro a little bit, but it's great to see them open source because, this quality is open source. we have open source editing models from, you know, Z image and Qwen image as well. So it's gonna be interesting to put them together. Wolfram remind me later on the show when we talk about the topic, cowork, to actually add Klein to that demo that I have and see if, it just happens. I really wanna test whether or not it just happens. so breaking news there. While we're there, we can also cover the other, AI, art and diffusion news segment that we have. And this one is from, ZIZ AI also released an image. Model. And this one's open source as well. Z is GLM image. So, open source, hybrid auto, aggressive diffusion image generator. A lot of words. Basically it's an image model. that's the, they, they have, they claim state of the art text rendering accuracy. And I have here a picture of somebody comparing the two, and you can see that it's not the best. I'll say though, it's really funny, Ryan, the amount of expectations that we now have from these models compared to the fact like a few years ago, this couldn't render a minion. That's like, cartoon character is absolutely insane. But yeah, they claim to have a state-of-the-art, text rendering, which is, which is great and we know it's very important. So they have a benchmark that detects, a text performance in, in image generations. And that one, they apparently beat This is, this is kind of the big news here. let me see And zoom in for you guys as well. one k, 2K output from, one K tokens. I think that this is it on the AI image generation stuff. So I think that the more important news is the breaking news from Flux, which is, great. And yeah, the demo that I have, we'll definitely try and test it out visually and see. Cool. I think it's time for us to go into Big Labs. I think there's a lot of stuff happening in big labs. do we wanna start with the partnerships and drama quarter? I really like I mentioned this in the tldr. Oh, it's
Ryan Carson
Ryan Carson 22:48
fun.
Alex Volkov
Alex Volkov 22:49
Yes.
22:49
So it, it's very, you know, if it bleeds, it leads type news reporting. We are known not to do this on, on ThursdAI for a while now. We known for the deep technical dives into open source and, you know, we bring the neck beards, the geeks to dive into open source. But this, because there's so much that There's a lot of stuff happening in the world of AI that's not applicable right now, but it's very, very exciting. So yeah, get the popcorn. all right. So we'll start with, I think the, the drama that's happening between thinking machines and OpenAI Thinking machines is a startup, to train models and do some bunch of stuff, co-founded by Miro Mann and, and a bunch of other folks previously OpenAI on Tropic, I think as well. it looks like the co-founder of Thinking Machines and the CTO, and two other co-founders left back to OpenAI. One of them was supposedly fired for unethical conduct. and then an hour later after this announcement, we saw that, he's back in open air, so we don't know what this unethical conduct is. I wanna bring back the names. Smith previously from, PyTorch is now the CTO of Thinking machines. So shout out to Smith as Graduate. stepped up there in a big way. we also have some drama with philanthropic and open code. if you wanna chime in here, please, please do.
Ryan Carson
Ryan Carson 24:12
I mean, this is in kind of my world, so it's kind of interesting to see.
24:15
open code is great, by the way. I, I think wanna be clear that the more great agent harnesses we have, the better. but it appears that they were using the, $200 a month max subscription as kind of a wrapper. and, that was against the Ts and Cs and obviously Andro didn't like it. I think it's been resolved now. you know, and I think this is one of those reasons where we, we kind of charge people for tokens 'cause we don't want to get into that world at amp. So, interesting to see it play out.
Alex Volkov
Alex Volkov 24:44
I wanna just like clarify to the world of non-developers what we're
24:49
talking about and Topic offers their models via APIs and you can use those models via just paying them for tokens. Just clear one-to-one token payment via an API key. So you can use Claude code, for example, for that. But, they also offer a heavily subsidized max plan, either a hundred dollars a month or $200 a month for like five x tokens. If you are one of the crazy users that runs Ralph on everything. and just like nights and days of token streaming, it's very much worth your time to invest in a Max Plan. open Code was using the Max Plan subscription somehow via connecting directly to Tropics, kind of like premium subscription. and then tropic blocked that. It doesn't mean the Tropic blocked, open code from using Opus. They could still use Opus via the tokens. However, for open code, obviously as a product offering, it's much, much. More expensive because they pay for tokens. So they blocked that. But also I did see Wolfram, you mentioned this somewhere, that Tropic also blocked open code in system messages.
Wolfram Ravenwolf
Wolfram Ravenwolf 25:53
system prompt starts with you are open code, it was blocked.
25:57
There was a report about it on X. What I tried is to go into anti antigravity where you can also use a Claude model, changed my system prompt and said you are open code. It still worked. So of course there is still prompts around this probably something that Google does. so I can't confirm it myself. I've seen the news, I've shared the news, and if it is true, that is pretty zacky. If they just look for the string open code in your system, prompt and then block you, that would suck. But I couldn't confirm it myself,
Alex Volkov
Alex Volkov 26:25
we should mention the open code is a direct competitor to
26:28
Claude Code, a $1 billion product line business from Anthropic Aside project that somebody built in there and suddenly like it's a $1 billion product. So it's a direct competitor to that. And you know, it makes sense from Anthropic to do so. what is very interesting though is how much other labs jumped on the opportunity to highlight themselves. So now, GitHub and OpenAI. Are all open code compatible, and you can use the Codex subscription. So whatever on Tropic said, Hey, we're not gonna do, OpenAI and GitHub both stepped in. So now Open Code can use your subscription for OpenAI Codex, for GPT 0.5 0.2, which we're gonna talk about in a second. And for, for the copilot stuff. So that's, you know, very interesting.
Ryan Carson
Ryan Carson 27:13
I'll probably be the guy to say that this is where ad supported
27:16
models are kind of interesting for tokens. there's a bunch of folks doing now, but at AMP we ship this idea of, Hey, you get more tokens for free because we use ads. I think paying for tokens via the API is just pretty important. I'd be surprised if most labs don't continue to support Vanilla Token use via the API 'cause that's literally the product.
Alex Volkov
Alex Volkov 27:36
on this topic though, there's another piece of not really relevant,
27:40
but interesting news in this corner. and Tropic also blocked x AI from using their services Claude code this week. And Tropic, I will say this very slowly, 'cause Yum is raising his eyebrows and tropic, the, the maker of Claude opus and Claude on it has blocked X ai. Their competitor, their competitor owned by Elon Musk, building their own model series in Grok from using Claude code to internally now they not only block Claude code, they reached out. Two cursor in the intermediary that sells access to Tropics APIs and told them that they also cannot serve Opus to X ai, which I found very interesting.
Wolfram Ravenwolf
Wolfram Ravenwolf 28:26
that keep, that keeps happening.
28:27
We had seen stuff like that with Windsurf, wasn't it? Windsurf, one of the IDs was not allowed to use one of the newer models. So these things keep happening and I don't like them at all.
Alex Volkov
Alex Volkov 28:37
again, for our question, it means nothing.
28:39
It's just really, really interesting to see the moves that are being played there. As on tropic realizes the Claude code and maybe cowork that we're gonna talk about next is a big line of products. Nisten, go ahead.
Nisten Tahiraj
Nisten Tahiraj 28:49
There's going to be a lot of spouses and family
28:51
members of, x AI engineers now with the Claude Max subscriptions. I, I, I would believe, but I, I want to say that, look, I understand from their point of view that they are subsidizing Claude code and they don't want it to be that easy to be used. That still doesn't stop you from using it. You can still use any harness to control something, to control another machine and you can use any harness to, all of them, to just run a cloud dash P command and then you have your Cloud Max subscription. The only thing was that it's not. Integrated fully, fully in. But if you just tell your sub agent, Hey, just run Claude dash B or just open a a, a team up session and use Claudee Mac, you can, you can still use it. Yeah. So it's a bit of a dance here. I understand them just blocking API access from their competitors. I mean, it's pretty, it's pretty gangster move, but it that I understand. But the other one that's just, that's, that's just annoying because it doesn't, you can very easily use, you can very easily still use it. Just don't put open code in the prompt.
Alex Volkov
Alex Volkov 29:59
I wanna highlight this section, clip this for sure.
30:03
There's gonna be a lot of spouses of X AI employees subscriptions, Nisten, that's on point. All right. we're moving on. We're still in this kind of corner of hot weeks, hot takes and this dramas whatever. Apple has announced that Gemini is going to power in, in a joint announcement with Google that Gemini is going to power Siri. after careful evaluation, I'm reading the quote here. Apple determined that Google AI technology provides the most capable foundation for Apple foundational models and is excited about the innovative new experiences it will unlock for Apple users. Apple Intelligence will continue to run on Apple devices and private cloud compute while maintaining apple's industry leading privacy standards, That's not from the quote. That's me adding my thoughts on this because Siri is still dumb. It's 2026. We've seen the ads that they say, Hey, Siri, do the thing. Spin out. you know why I'm so pissed off? I got so excited about this. When we told you about Apple Intelligence, because I'm in their ecosystem and I got super, I gave them one hell of a benefit of the doubt that they can achieve this. And I still use Apple Intelligence, but it's kind of forced on me right now. I'm not like enjoying using this series still so, so, so stupid. And it's time that Apple steps up. It's a 3 billion, $3 trillion company. They have all the money in the world to train their models. They're playing SIE with OpenAI, now they're playing SIE with Gemini. Meanwhile, all their users are sitting and not getting into the world of ai. Sir, still really dumb. Bell Rams is the person in the ads they pulled from YouTube because they didn't do
Ryan Carson
Ryan Carson 31:38
that.
31:38
I think it's crazy that Apple doesn't have a large language model like the, it's just unbelievable. They aren't one of the model labs. Like I think all of us are just kind of. We're just throwing our hands up and saying, how did this happen? Like, well do it now. Like fund a lab team. They have the cash, you know, train your own model. Like this is crazy.
Alex Volkov
Alex Volkov 31:59
You have the money.
32:00
Just go. There's just like talent out there. You can offer $2 billion to ral, you're gonna buy them. I'm not sure this is a good move for ral, but like, it's, it's ridiculous. It's ridiculous how far Apple is missing out on this thing. The AI ML team at Apple, somebody named them aimless.
Ryan Carson
Ryan Carson 32:16
so I had a, a real world moment where, you know, I was
32:19
outside of my, my Twitter bubble where we all talk about AI all day. And I was in a, I was in a cab, on the way to the airport and, the gentleman said, you know, teach me about ai. 'cause I said, I'm an ai. And I said, well, do you use Chat GPT? And he's like, yeah, I do. And I'm like, oh, okay, how do you do that? And he said, look, and he talked to Siri and then it says, can I use Chat GPT? And so he thought he was using Chat GPT. And then at the end of the ride, I said, let me help you install actual Chat GPT. So we went, I like onboarded him onto it. It's hilarious. and it was just interesting, like, so now we saw some of that connection. So I guess they're gonna think they're using Gemini, like Yeah. In the future.
Alex Volkov
Alex Volkov 32:57
it's very interesting to me that SIR is very dumb.
32:59
It's smart enough to, to know that it's dumb. So it's like I cannot answer this. Let me hand it off to Chat GPT. And then it does, this is the connection, the kind of the behind the scenes connection. It's still very stupid. But now that apparently will go to Google and Gemini, it still will call Apple foundational models, which is ridiculous naming because it's not. I'm done. I'm, you know what, let me turn positive for a second. I'm hoping that this will actually deliver some good experiences. I am hoping Gemini are great models. Like we, I've been using Gemini, we're gonna talk about Gemini personalization next, great models. So definitely, I'm hoping that this is gonna be helpful. On the topic of partnerships, OpenAI announced a partnership with Cerebra Cereus. They have the huge like wafer scale chips that we talked about. the partnership for 750 megawatts of high speed compute. The point of cereus is when you run some stuff on their chips, the throughput is insane in the 2000 tokens per second. For smaller models, it's absolutely bonkers. this deal is apparently $10 billion deals. it's inked for 2028, so nothing usable right now, but just imagine Chat GPT streaming at the speed of Cereus. I'd be very happy with that. Like it's not that GPT slow, so still pretty fast. it's just very interesting that this deal, is happening where. Nvidia just bought GR with a queue. The compute also specific ASIC chips for, inference. And now OpenAI is partnering with Cerebra. So the fast streaming companies, there's one left, there's three of them. There's GR Cerebra and Sam Manova. Right. And we, I think we covered all three of them. I think we had representatives, all three of them on the podcast. and now Cereus is partnering with, OpenAI and Sam. Anova is, somewhere very interesting tidbit that I saw about this is that when Elon Musk, Sam Altman, Greg Bachman, Ilia Ko, all co-founded OpenAI in the beginning, Elon Musk tried to get some Altman to buy Cereus back then. And, and I think one of the reasons why the whole thing fell off is because he said, Hey, we'll buy this to Tesla or whatever. And they didn't want the integration to blah, blah, blah. So it's very interesting. I think we covered all the, all the spicy partnerships and news that are not really useful, but fun to talk about. I think we covered everything. Anything else from, from, the Twitter drama world you wanna cover? this is me trying to compete with tb, TBPN and their timeline reaction segment where they talk about just like x. all right, let's dive into big companies and APIs. There's A lot to talk about, And Tropic announced Claude cowork, which is hard to describe. It's easy to describe, to say this is Claude code for the masses, if you know what Claude code is. But apparently there are people outside our bubble, like the folks that Ryan helps install JGBT in taxes for, and those people have no idea what Claude code is. So how the hell do you explain to them what Claude Cowork is? I'm not really sure. but let me try anyway. So assume that you don't know what Claude code is. Claude Cowork
Wolfram Ravenwolf
Wolfram Ravenwolf 35:44
one thing, you also have to explain to them how much
35:47
a costs I, if they want to use it, they need a produce subscription, which that Yeah, that's, that's right. It's only for Mac subscription right now because it's
Alex Volkov
Alex Volkov 35:55
a research preview.
35:55
But Claude Cowork is an agent, local running. Product from Anthropic that uses their best, in class models, and also doing some coding for you without you thinking about this and can do a bunch of stuff on your computer. Currently only Mac, currently only from Max subscribers, which is a hundred dollars to $200 tier. Depends on how much you want. I think it's a good enough description for Claude work. I did a video on Claude work that exploded and I think the reason why, Ryan, last week, you went mega viral this week. it is, I, because I asked Boris journey, the guy who created Claude code and works on Claude code, how much of Claude cowork the new product was written with Claude Code Boris Journey answered a hundred percent of it. And I also clipped, I forgot his name, Felix, I think I also clipped a, a video from, shipper talking to the guy that created Claude Cowork, that he said We did this in a week and a half in philanthropic. So this was a week and a half. Yes, this is, Felix Berg. He joined shipper and he said, this is us sprinting for a week and a half. And I, you know, this, this went also kind of viral, because. It makes no sense to regular people. It makes sense to us in the rough world, but makes no sense to regular. all right folks, this is Cowork. Cowork is a UI that, that sits in Claude Mac app, I think on, on Web as well. But it's only useful on the Mac app because it connects to your. File system Cloud cowork is basically, you know, for those of you who do know what Claude code is, it's a wrapper on Claude code for non-techie people. It can do a bunch of stuff. They claim it's an early research preview and it's been not vibe coded, but it is been like Claude code coded. It is basically vibe coded for a week and a half. it has progress tab on the right. It has artifacts. Artifacts are kinda like HTML pages, et cetera, that it can do for you. It has context which you can add. and it has connectors, connectors like Claude for Chrome. Claude for Chrome is the ability of Claude controlling Chrome browser and run and click different things. Why is all of this important? Is because, it can achieve some tasks for, so the type of tasks that they suggest here when you have the suggestions is create a file, organize files, prep for a meeting, make a prototype, et cetera. But the thing that I wanted to highlight is code agents are generalized agents. And this is the realization that the Tropic had when they released this research prototype because many people over the winter break, they had some time they learned about Claude code and even non-developers started like building things and having Claude code with the Chrome extension, achieve a bunch of stuff, do taxes for them, et cetera. So we're gonna do live something I really wanna show Claude cowork and how it works while also achieving something so Wolfram. You said that I had a demo with the ai, image generators, and I think I can find this demo and we can add flux context to that, right? Let's do that. Let's do that. We're, we're gonna choose a folder. Claude cowork is, mainly successful because of the security stuff that they have. So they are sandboxing the code that they run to that folder using some virtualization metrics so they do sandboxing virtualization, which is very important for security because when you run code on your computer, it can do bad stuff. And we've seen many people for whom it deleted whole repositories. So we're gonna try to find our infinite phone extension, which is the extension that Wolfram you mentioned. this is the infinite phone extension. We'll open this guy up. it requests access to the thing. And then we will, ask of this guy. can you find out which models my extension supports right now? So first of all, I'll just let it read the files, figure out what's going on, but then also, I'm using File So it basically learns on the fly of what's going on in the folder. so you don't have to do any type of like, context engineering it, it will learn, it gives me an example. Okay? This is the Imagination Models that kinda supports Fal, the MS Turbo and any final math can be by Profix with foul dope. So now it already understood, Wolfram just sent me the, the link address. Okay? Now please add this model from file so I can test it. That's it. You are basically talking to this guy and they were supposedly will do the work for you, and do the things. the cool thing is you could see, a little error here that says, unable to verify is safe to fetch blah, blah, blah, network restrictions, and it understood that which model needs to run and done. Folks, I haven't seen one line of code. I think this is the highlight here. Claude code still shows you diffs and it's kinda like developer focus, right? I have asked this guy to iterate on some coding thing that I have, but unless I really wanted to, unless I click this guy, I haven't seen one line of code, nor do I care, nor do most people lately who use like different Claude codes. People just like hit like, next, next, next, whatever. All you need to do is that you have a task to achieve and how to achieve this. This is, I think, the paradigm shift. This is the, the big thing, in cowork. we definitely should test if they're working. Ryan, meanwhile, feel free to comment.
Ryan Carson
Ryan Carson 41:23
Yeah, I mean, this is the same reason why Ralph took off, right?
41:26
People are realizing that the model is good enough to complete specified, tasks well without micromanaging like very well. So this is just an extension, a UI on top of that, and I think you're gonna see a lot of agent harnesses move towards a UI that's much more like this and much less code based. the best devs in the world that I know now don't write any code by hand,
Wolfram Ravenwolf
Wolfram Ravenwolf 41:51
I also noticed the change, if you have used the Claude code
41:56
extension in Visual Studio Code, for instance, it used to go to the terminal all the time and they replace that now and by default it now has a text pane, like the Anti-Gravity for instance. So even there they have moved more to, more end user, right? In a way. So more user friendliness. Yeah, that's coming.
Alex Volkov
Alex Volkov 42:16
I think that this is it.
42:18
Flux Two. Let's see, save settings. I think that it worked. It didn't say Flux two. Oh, wow. Yeah, it does work. Let's see here. It's taken a while, but it does seem like the, the thing that Cloud Cowork did for us is working, and it does seem like we're generating images with the new, flux, with the small flux. The nine B. Yeah, the nine 90. I think it's client, but it's still like fairly, fairly slow, so I'm not, not quite sure what's going on if it did work. so for that we'll go back and did do look at the code, right? So we have da, da, da if image path, blah, blah, blah, blah, blah. files image turbo client nine B. Yes. This is how it looks and I don't know what, yeah, so Claude Cowork did the work for us, and it absolutely took a second and a half Here's the interesting thing. The example that I showed you right now is a developer's example. I had a project on my computer that did some stuff and I asked it to edit some stuff. I knew exactly what to add, where to edit, even though I didn't look at the files, I knew I was the product manager for this thing. cloud Cowork can do so much more for you. So one example that they show is a stupid example that immediately everybody sends sort my files on the desktop, which is, it is a really funny example 'cause nobody's gonna use this. Nobody cares. but everybody has like a big messy desktop. you can just open desktop here. let me choose desktop in, in here.
Nisten Tahiraj
Nisten Tahiraj 43:42
Can you code?
43:43
No. Can you use Vim? no. I know what I like and I don't like I am decisive in what I prompt.
Alex Volkov
Alex Volkov 43:50
Yeah.
Nisten Tahiraj
Nisten Tahiraj 43:52
What do you get paid for as a, as a programmer?
43:56
People are gonna be like, the confidence that I have in my taste that my ability to express has proven helpful for,
Alex Volkov
Alex Volkov 44:03
it's the, it's the era of the idea guy.
44:06
It's the era of the idea guy. The idea guy who used to be paired with a coder. Now is the idea guy's time. so basically here's a very, like, very simple example of Claude Cowork that can do some stuff for you without coding. Obviously it will run command, but you don't care. he found three versions of Road Central on my desktop, all package installers and the app, both zip and the package you already installed. These, you can safely delete installers, all files for mp, three files, blah, blah, blah. Images, app bundles. So Claude cowork, for any task that you have on your computer, I think the most important part that we didn't cover yet, there's two parts. We're gonna get to skills, so this can use skills as well. but the connector I think is very, like a very important part here. I have a few connectors with like NA 10, and Claude and Chrome. So the Chrome connection is very important. This is very similar to, very similar to Atlas browser and the comment browser that we, that we talk to you about a lot. This is now. Claude com, Opus et cetera, can run stuff in your browser so it can help you connect things to like, go into my browser and do some research in Twitter, for example, and then do this and do this. I think that's a huge connector. Claude can also do this via disconnector, so it's not like a huge new thing. But I think the combination of running code, writing code, running code locally, plus being able to go to the browser, I think this is a big combination there. And other connectors are Google Drive and Gmail and Google Calendar and a bunch of other stuff. And obviously Ontop came out with the MCP spec. So every speed there is supported.
Wolfram Ravenwolf
Wolfram Ravenwolf 45:39
anything
Alex Volkov
Alex Volkov 45:39
else
Wolfram Ravenwolf
Wolfram Ravenwolf 45:39
on the, like, to add one thing, the target audience is
45:44
people who are less technical than us, not coders, not people familiar with these things, and they have to watch out because, we are using Git for everything. So when AI does something for us, we can revert and check in, check out stuff. if you're using it on your files, on your system, be careful what you are doing because, I guess a lot of people are not even having the backups and if they're using this tool without some safety nets, watch out what you do.
Alex Volkov
Alex Volkov 46:09
Yeah,
Wolfram Ravenwolf
Wolfram Ravenwolf 46:10
be careful a
Alex Volkov
Alex Volkov 46:11
hundred percent running code on your system, even in Sandbox is not.
46:16
For everyone, just to open up. So you need to know what you're doing. any comments, folks, or we're gonna move on from Claude Cowork. I haven't got super excited about this. many people got excited about this. it is a research preview, so it's still robust. Again, written in a week and a half sprint, a hundred percent of the code was written with Claude code. It's just fucking insane.
Wolfram Ravenwolf
Wolfram Ravenwolf 46:36
I think it's basically that.
Alex Volkov
Alex Volkov 46:38
Yeah.
46:39
I mean, I looked at the scope of this like feature. It's quite a lot of extensive stuff that you need to do. You need to like, it creates a plan, for example, and no Claude code also creates a plan, but you have to like take that plan, put it in the ui, update it, like crosscheck mode. There's a lot of like visual interfaces stuff that needs to happen. Those are small. Ryan, if we're in a startup a decade ago, six months ago, even, even now for startups that are not AI native, this is like three to six months. The ship the week and a half easy. This is so much. There's just so much.
Ryan Carson
Ryan Carson 47:10
Yeah, I mean, the amount of work we can do now is just staggering.
47:13
it's an exciting time to be alive.
Alex Volkov
Alex Volkov 47:15
I wanna, address this comment from, mark Erdman.
47:17
He says, quick poll, is there anyone who doesn't use dash dash dangerously? Dash skip dash permissions. Dash dash dangerously. Skip permissions is a flag for Claude code, not the product that we just showed cowork, the Claude code, the command line, app that allows it to stop asking you for every time it wants to run something and just run and do, commands for you. And those commands sometimes can be rm, rf and really delete some stuff or Rebase on gi, delete some stuff. So, many people run with dangerously skip permissions. I tend to not do that unless I know exactly what I'm doing.
Ryan Carson
Ryan Carson 47:53
I have it on and honestly, AMP has never done
47:57
anything super bad to me. Now, I'm sure that will happen at some point, but I have three local checkouts of my repo, and I work across 'em instead of using Get trees. And, I've honestly never had anything too bad happen. what scares me is migrations on prod. I've had a couple times where it was like, I'll run that migration. I'm proud. I'm like, no, you won't. so there's a couple things like that that still freak me out.
Alex Volkov
Alex Volkov 48:20
All righty.
48:20
go ahead and welcome with the way.
LDJ
LDJ 48:22
Didn't you have Claude wipe your PC or something like that?
48:25
Right.
Yam Peleg
Yam Peleg 48:27
Actually, nothing too bad ever happened to me.
48:32
look, it will happen at some point. I'm constantly running dangerously. Yeah, dangerously do whatever you want. I have LAS on, a BRC for all of them to just do whatever they want. But, also, I have my computer backed up like every couple of days, so I'm not that worried if I wouldn't have this backup, I'm not sure that it would be a good idea. So I think it's somewhere in the middle if you have snapshots or geet or something, you can let loose, because at the end of the day, it saves time. You don't need to approve everything. You can run stuff in parallel. There are real benefits to do that. However, I mean, yeah. I mean, Claude I, I've seen horrifying things like Claude going, yeah, I'll just, I dunno. I'll just, fix, I'll just make sure all the, there are no tests that fail and then delete the, all of, all of the tests and so on.
Ryan Carson
Ryan Carson 49:26
I feel like everyone that talks about this doesn't
49:27
know how to use get or something. I mean, this is literally what version control's for, so I don't understand it. this is not rocket science people, this is what version control is other than ing your prod db.
Alex Volkov
Alex Volkov 49:40
Yes, databases and ation is definitely on.
49:42
Production is hard. All right folks, we're moving on as just quickly before we move on. as a reminder, the show is not, does not have any sponsors. The smar we have is Waiting Biases. my employer, Wolf Firms employer since recent as well. And, we have, a corner here that talks about wait and buy. It's called This Week's Buzz. I have some news for you. let's go to this week's buzz before we continue with the rest of the show.
50:20
Hey, welcome to this week's Buzz. This is Alex Volkov from Weights, &, Biases, and the only news that I have for you today is that our hackathon called Weave Hacks that we run in the Weights, & Biases Office in San Francisco is back, it's going to be January 31st, February 1st, So if you want to sign up for that, you need to go to, uh, luma.com/with hacks three. And then if you're signing up, mentioned that you heard about the Hackathon and, at ThursdAI, and we'll definitely, get you in. I'm gonna be there. I'm gonna be, I'm seeing the hackathon. We have a bunch of sponsors to announce Great, great panel of judges as well. you'll definitely be, viewed by some top of the people in AI in San Francisco. I'm really hard working behind the scenes to get you the most amazing panel, the most amazing prize as well. The theme for this hackathon is Self-Improving agents, and we're so stoked to see you there and to see what you're gonna come up with. There's so much to work with. There's the Ralph Loops, there's skills, there's amps, there's cruisers, there's so much to just do in the hackathon. Hackathons, in 2026 are just some of the more fun times that you can have. If you wanna take some break from your work, come down to San Francisco. If you wanted an excuse for your bus to fly down to San Francisco for something, this is a great excuse as well. so I'm looking forward to seeing you guys there. Let's continue with the show All right. Oh, we're back. We're back. We're back. from, from the little break. But yeah, I'm looking forward. all of you guys are invited as well. If you wanna be a judge, let me know. right. The next thing on our list for this week and big companies and APIs, a huge thing is GPT 5.2. Codex was finally released by OpenAI. They have announced and talked about this, and we talked about GPT 5.2 Codex on the show back in December. Then it was really surprising they did not release it to anyone. So you could use 5.2 Codex, within the Codex app. And I think Chat GPT, no, you can't use Codex in Chat GPT. It's only within the Codex app. But when they announced this, it's like, Hey, this is the long running thing. I remember talking about this because I interviewed a person from OpenAI who shall rename name, nameless, because he is not supposed to tell me what he told me, but basically that this model was the longest running model he'd ever saw. And he ran this model on one prompt for around a week, on and off for one prompt with an exit criteria. So this is like on something, on steroids. and, you know, not tons of people that we know use Codex. So not tons of people got exposed to this.
Ryan Carson
Ryan Carson 52:45
I've heard it's very methodical in a good way.
52:47
I actually want to try it for that, but I haven't done it yet.
Nisten Tahiraj
Nisten Tahiraj 52:50
It just takes too long.
52:51
that's the only thing I'm hearing. So when they try to use it as the main, they're like, yeah, it just takes too long. So.
Alex Volkov
Alex Volkov 52:57
So it's not in cursor and GitHub, copilot and vs.
53:01
Code and, factory droid, and then Versa Gateway as well. I gotta hand it to these companies implementing a model like this, like, it seems like a string replace somewhere in the backend, but it's not really, because every harness has its own things and we're having a lot of folks in common saying that it's great for, code reviews, for example. this is the state of the ARC model on soy bench verified, which is saying a lot because we saw this release and we haven't been able to try this even now. We're all here sitting, we're all have experience with different tools. for one of us, the container didn't start, the other one wasn't able to use in rep. Like, the accessibility of a model makes a lot of choice of how developers will use, acceptance. LDJ, go ahead.
LDJ
LDJ 53:43
similar to what Nisten was saying, for a lot of medium to
53:46
somewhat hard tasks that Opus can do. I would say 5.2 Codex can often do them as well, but it does it. It tends to take longer, and that seems to be a big downside of it right now. And compared to 4.5 Opus, which seems to much, more efficiently use its time and being able to get similar accuracy with at least medium kind of difficulty task, but faster. but when it comes to things that are really hard, especially if it's something like what Stan was mentioning, like bugs, it does seem to be especially good at certain niches of code. Like, such as, like finding certain bugs or maybe doing certain kind of, people have been kind of switching between like Claude and 5.2 Codex for certain planning things. I know, but I don't know if you were going to bring it up to Alex, but there's the, the cursor team, they ended up writing a browser from scratch. So I, I think that's something worth going over.
Alex Volkov
Alex Volkov 54:38
Yeah.
54:39
This is the next, segment on the show. I will just say before this, the two things that I wanna highlight in this model. We talked about them when the model came out, it's state-of-the-art on Swyx bench, internal Bench Pro. it supports environment, capabilities, et cetera. it's supposedly like very good at the design as well, remains to be seen, but it has native context compaction support. compaction is a feature in coding, harnesses and coding tools that says when it sees that the end of the context window is coming and this end is coming very fast lately because everybody's streaming a lot of coke and super, super quick. basically it's a way to tell the model, Hey, look back at your history. Compacted by compacted, it means delete all of the irrelevant things, the different code, diffs, file, diffs, et cetera. Irrelevant things keep only kind of summarize the conversation history and then start a new chat. so that we'll basically, be able to continue talking. we all know that the more you talk with a model, the longer you talk with a model in one chat, the stupider often it gets, and the reason is, is because like it starts to lose the beginning of the conversation and you're talking to it like a person and you're expecting the person to remember, you know, I think I have the context of every conversation I had with Ryan, at least generally, right? Like, we, we know where we are, we know kids, et cetera, blah, blah, blah. So like we have some context. when you talk to a model for a long time, it loses some of the context and then it like becomes stupider. So compact context compaction is the, the way to deal with this. You basically ask the model itself to look at the, the conversation history, summarize this and put it in a new chat and kind of like start from there. Native context compaction is a support, built into G German GPT 5.2 codex to too many, too many eyes. and, this means that improved token efficiency for long sessions is significantly, significantly improved. And to highlight why this and how this was affected, LDJ brought up the, the very interesting point Codex. The AI coding IDE, have used GPT 5.2, to, cursor. that's where I get stuck. Okay, I'll start again. The AI native ID that's been blowing up in like a $20 billion valuation. So like crazy, cursor have been using GPT 5.2 codex and they have, ran this for almost a week straight. And they have built a browser seemingly from scratch, not based on chromium, which most browsers are based on, seemingly from scratch, which is an insane achievement. I think it's like around 1 million to three line, 2 million lines of code. this is when we talk about like, Hey, the world is like vibe coding. This is on the extreme, extreme end of this. This is like very important to tell you like how, how extreme this is. it is just like this guy Wilson from Sphe, from Codex, build this, it's just like an insane amount of code was written for this. And they built like entire Swyx router from scratch with, tech shaping layout engine CSS cascade and HTML parer. building browsers is hard. It's very hard. I would even say it's very, very, very, very hard because there's a reason why most browsers today are based on chromium, and, you know, there's the layout engine, web kit, et cetera. and apparently they run this in rust. Complete web browser includes custom HT ML parsing, cascade layout, text shaping, painting a custom JavaScript, vm. This maximum monumental stress test for Agentic Coding with nearly 330,000 commits pushed during the experiment. they ran hundreds of concurrent agents using a planner worker architecture with GPT five two outperforming other models like 4.5 or long running tasks thanks to the superior instruction following and sustained focus. I'm pretty much speechless.
Nisten Tahiraj
Nisten Tahiraj 58:06
we gotta do a Codex Ralph versus an Opus Ralph.
58:10
I think it'll be fine.
Alex Volkov
Alex Volkov 58:11
so here's the question.
58:12
I don't know if Codex needs Ralph. The whole point of Ralph is automation and hands off and investing as much in pre-planning, and then have it run and, you know, not have the complex gonna run away.
Ryan Carson
Ryan Carson 58:27
I have a hot take on this.
58:28
I do not think auto compaction works. I think if you use these tools right, you'll find that you can't compact out what you actually need out of a thread. we don't do any black magic with compaction at amp 'cause I don't think it works. you run outta context, you start a new thread. this is the reality of large language models. they do not have medium to long-term memory, It's not built into the architecture and that's not solved. I think you have to be honest and realistic about the fact that you really can't compact threads. This is why Ro Ralph, the Ralph model works so well is every thread has fresh context. Mm-hmm. And I think, I think what people need to admit right now where we are with architecture is that really what you have to do is you have to have tasks that have, that are atomic and small, that have clear acceptance criteria. And you could have a million of those, right? And then you could run a million threads on that and that would work. Nisten ne Nisten. What do you think?
Nisten Tahiraj
Nisten Tahiraj 59:24
there is a little bit of an abstract form of incon
59:27
learning going on as you inference. As you fill the context window. and the reason, part of the reason, I mean, we don't know exactly, but why it gets dumber, is that if you're just repeating a lot of the same, a lot of the same information back and forth, it's just kind of like having, repeating training data. But again, this happens on an abstract level and, that's why it tends to get dumber by the end. but you do notice that if you are very careful with how you build up your context, like if you build it up as a sort of curriculum learning, where you teach it, Hey, this is my system and this is how we run it, and this is how we build this, and this is how we, we build that. And then in the end, I want you to do, to put this together. That can work very well if you, if you put it together like, like that. But there's also, there's the entire like, rotary embeddings and like just how uses different tokens to call on on different OO other tokens and all the tricks that are used to extend context because the native context is, is a lot smaller. But, yeah. So all of these factors just play into the model being. Way smarter in the first 30, 60,000 tokens.
Ryan Carson
Ryan Carson 1:00:42
And because
Nisten Tahiraj
Nisten Tahiraj 1:00:43
a lot of the training data,
Ryan Carson
Ryan Carson 1:00:44
I mean, but this incon learning, like what we're finding
1:00:46
is hilariously dumb, which is that you could use a file, called Progress do t xd to basically give incon learning between threads. And I remember Swyx post about this. He is like, ha ha. It turns out memory is just the file system. yeah, that's kind of where we're at right now until we get a better architecture with actual in context learning.
Alex Volkov
Alex Volkov 1:01:06
So, you know, compacting aside, it's clear that we have at
1:01:10
least a attempt as a competitor for Opus, which we all obviously love. And I think it's very interesting that so far, it is very hard outside of Codex, which we again have a bunch of folks in comments right now blowing up and saying, codex is the shit, folks, you should try it. And we'll definitely give it a try. but outside of Codex, there was not a lot of use for, 'cause it wasn't, nobody was able to use GPT 5.2 codex outside of the Codex product itself. it's very interesting that stuff like Ralph Wig, the loop, the loop itself and the trust in file system is the whole point is to like. Have fresh context for everything, but still keep, like, keep it on task. running for a week straight running 3 million lines of rust and 30,000 commits with hundreds of concurrent agents and billions of token process is very impressive regardless of what we think. It's, it's just, it's just mind blowingly impressive that it's possible. And we live in a world where the shit can like fucking write a browser. Still mind blowing to me. All right. if you, have been testing, leave us, in the comments. If you're listening on, on the podcast, I would love to know, what's your opinion about, GPT 5.2 versus Opus? it does definitely look like a competitor. It does look like, a Grok four 20 is gonna come out and not be a competitor. I think Elon kind admitted this. so we covered the new OpenAI. The last thing I wanna cover before we go into skills, deep dive folks, is that Gemini is stepping in to the age of personalized ai. Welcome. Thank you. Not a lot of companies, maybe outside of Meta have as much information about you as Google. If you're in the Google ecosystem. Email and text messages, and Search history is another place where, an AI can know a heck a lot about you because you tend to search for everything, and everything under the sun. This is very valuable for obviously other companies, but now it looks like Gemini will have access to all of this if you allow it to be able to give you a significantly more personalized experience. I think that's incredible. You can enable AI to reason across Gmail and YouTube. you can enable AI tools across Gmail and YouTube and photos and search. If using Google Photos, that's another huge unlock because, the one example that, head of ai. Google, Josh Woodward showed that he is like, Hey, I need tires. Which tires should I get? Gemini Personalized was connected to his Google Photos and found out that he has a, Honda Odyssey. so anyway, Gini found out that he has a, car based on the emails that he has for registration and the pictures, and basically said, you have an odyssey, so you have these tires. J BT cannot do this. They can with the connector, but they're not like, as effective at it. And obviously Google, mode is the access. So I think that this is like crazy. This is a huge data moat that Google has on top of competitors like OpenAI and, and Tropic. for me, the test that I had with Gini personal, intelligent, I had access for a while just for, for clarity. I sometimes get early access to Gemini stuff. I've asked it what kind of car do I drive? And to make a little bit of a joke, on Jeff Woodward, I went and searched for Honda Odyssey and took a picture and, reposted his tweet. And so my personalized agent in Gemini went into my emails and I don't have photos. So we went into my emails and it is like, I'm pretty sure you drive a Tesla model Y but you recently searched for Honda Odyssey, so maybe you were thinking about switching. I was like, it was really fun. it's uncanny folks. To see an AI agent says, Hey, you recently searched for this thing. Maybe you're interested in more.
Ryan Carson
Ryan Carson 1:04:41
So I think we're in a weird spot where a lot of us have different,
1:04:44
Gmail addresses, work addresses. Like I have probably six, you know, Google Workspace addresses. So I, I so wish that somehow I could connect in all of them. where do you all think this is gonna go? Because I'm definitely in the Google ecosystem way more than any sort of Chat GPT, ecosystem, right?
Wolfram Ravenwolf
Wolfram Ravenwolf 1:05:05
That's also the reach Google has because
1:05:07
Entropic, yeah, offering. Now the cowork it is great, but you have to, get the plan, you have to know about it and so on. At Google, they just make it available. You probably get a popup if you want to enable it the next time you go into one of the apps and people use it, and immediately you have millions of users. And that is, a mode. Google has the reach, they have the data, and they have all the users. So that is pretty unique. Even I can compete, although they have the first mover advantage, so they have a lot of people, but Claude, we know it. But, most people I talk to in real life, they don't even know about it. They always look at me. Huh? What?
Alex Volkov
Alex Volkov 1:05:47
I think that given the 17 years or so, I'm in Gmail.
1:05:52
I'm so entrenched in that ecosystem, also in the Apple one, but nothing's coming out of Apple. none of the supposed, Hey, let me read your text messages and then go to your photos on your iPhone securely. none of that is coming so far. as long as none of that is coming, Gmail is kind of the secondary ecosystem that I'm in. you know, I'm running searches on Google as far as I'm, still searching Google. And, the mode is huge. The mode is absolutely huge. And I will tell you the feeling that I got from that point where it said, Hey, I saw you search for Honda Odyssey recently. Are you thinking about switching? Google used to sell those signals of what you search to all of the ad companies, and this is why Google Ads is still the biggest business in the world. And so I think it's very, very important that like now AI will have access to those signals as well. you sometimes search without even thinking about this. I don't think that most of us register that. we search for stuff that we care about without even thinking about this. And if it's stored somewhere, it's a signal. And that signal is good for ads, personalized stuff and personalized experiences and, targeting. So it's probably great for AI models as well to be able to be fully helpful for you. Right. LDJ, go ahead.
LDJ
LDJ 1:06:55
Yeah.
1:06:55
On the question of where it's all going, I think you'll have essentially MCPS for everything. I don't know, long term it won't exactly be called mcps and it'll kind of morph into like us adopting this other thing that's maybe better in certain ways. But overall I think it'll look similar. And then just like for your use case, for example, Ryan, where you have multiple different workspaces, I imagine Google might make available MCP or similar, where the other ais can then connect into that, have access to that information, and then you can have like these generative interfaces maybe, or custom interfaces within Chat GPT where it's basically like a bespoke interface for you to seamlessly interact with all those different Google workspaces. I think that's what it'll look like. Yep. I would love that.
Alex Volkov
Alex Volkov 1:07:41
Well, definitely interconnected.
1:07:43
I'm not sure if it's MCP or some other proprietor stuff. anyway, we have some real world use cases that were pulled from Twitter or folks who have personalized intelligence tree planning from email confirmations. So like you can say, Hey, look at my email confirmation, doc food recommendations from your photos. Retrieve wifi passwords from saved images. that one is a very interesting one. And get the core ideas based on your photo library style. I think that, you know, what I think the next step is, if you wanna ask me, I think that still 2026 is going to be the year of pro agents and proactive is agents doing things for you without you asking, doing, working on your behalf, behind the scenes, while you asleep, et cetera. that's what we want with Ralph Loops. That's what we want with different things. When my agent knows about me as much as Google does that, you know, that becomes even better. If something scans my inboxes, my photos, my flights, whatever prepares me, that's the whole promise of, uh, G BT Pulses, for example, that I think that this is where we're going and when my agent has all the contacts it needs, I think it's just gonna, can do. It's job. That much better. Alright folks, I think it's time for us to actually dive into a topic, a very interesting topic called agent skills. I will say before this super quick, because we're probably gonna end the show afterwards, we haven't covered, but there's a very interesting developments, in the voice and audio and, video doesn't have a lot. There's only VO 3.1 update. pocket TS from QAI is a hundred million parameter open source text tope model with voice warning that runs in the browser. I think that's super cool. And I just wanted to mention qai, is doing incredible things. a hundred million parameter. just nothing. It's like a very tiny model. You can probably run this on, the Richie, meaning that Richie didn't even say hi to us today. we will connect and say hi to, to Richie in a second, I think we will. we're now moving to the, deep dive section of the show. I wanna introduce Eleanor Berger to join us. Hey Eleanor, welcome to the show.
Eleanor Berger
Eleanor Berger 1:09:34
Hey everyone.
Alex Volkov
Alex Volkov 1:09:35
Hey, thanks for joining.
1:09:37
it's great to see you here. I've invited you because when I researched, skills, we talked about skills, when they released, we talked about skills when they became, an open source standard. I saw you kind of like dive into the super quick. you have a course that you teaching folks to get up on skills. So I figured I invite you to have a conversation about what agent skills are, and, definitely gonna be very, very interesting. please feel free to introduce yourself, by the way, do introduce yourself to other folks 'cause this is your first time on the podcast, and then we'll dive in.
Eleanor Berger
Eleanor Berger 1:10:03
Sure.
1:10:04
So I'm Eleanor Berger. I'm the founder of Vient Ventures, where I try to sort of get everyone onto the agent platform, which a big part of it is skills. I became sort of a self-appointed evangelist for skills, which is a bit funny, but I am super excited. I kind of saw skills when they were just released as kind of a Claude thing and realized that this is gonna be big. And so I've been a little bit obsessed since then and kind of put a lot of effort to learn them. Created and shared a lot of skills. Now, I have like a series of tutorials on my YouTube channel and the website where I try to kind of help people develop the mindset rather than just get a recipe or like a technique, understand how to think about skills as a way to customize things. And I teach a workshop now that's kind of a bit more hands-on really helping people, get into, using skills.
Alex Volkov
Alex Volkov 1:11:06
That's great.
1:11:06
what is so exciting to you about the skill? let me just like ask you straight up what is the main kind of draw in agent skills before we get to like what they are and how to use them, et cetera.
Eleanor Berger
Eleanor Berger 1:11:16
There are really two things.
1:11:18
One thing is just that there's a standard. So it's been real frustration to everyone working with agents, mostly the coding agents, but now they're becoming a general purpose thing that you had this, like every agent had their own ways of setting things up and some of it quite baroque with, rule files that match on, patterns and you install them in all kind of different places. Finally, we have a standard, everyone agrees. it's now very widely adopted. I think pretty much by every, all of the big agents, I think this last week saw cursor on and, anti-Gravity and Gemini CLI were the last ones not to have it that they now do, at least in their canary channels. But the more important thing, and this is more of a conceptual thing than a technical fact is. Skills are an admission that we now have general purpose agents. So we have general purpose agents that know to go on and on and on and call tools and all of that, and that have an execution environment, a computer they can work with, maybe a laptop, maybe a VM or a container in the cloud. And that's all you need. They do everything you need except that they don't know what you want to do. And like your knowledge, your workflows, all of that. And that's what you have skills for. So they're very simple. They're just marked on files and the directory. when it finally clicks that, yeah, we have this, powerful general purpose, intelligence system, and we have this very simple mechanism to customize it. That's the big unlock. So, super excited about that.
Alex Volkov
Alex Volkov 1:13:00
So the big unlock, I was super excited to talk about today's show
1:13:03
specifically because agent skills clicked for me I think in the past week or so. I've been following you for a minute. I think you started popping on my radar when you started collecting this support. And I saw that the support, for this open standard is more and more available throughout, and I collected some of the support So from chat agents, the places where you can chat with an LM, I think Claude is the only one that supports it. OpenAI doesn't support skills yet. Gemini doesn't support skills yet, et cetera, blah, blah, blah. In coding IDs integrated development environments, I think pretty much across the board, support Cursor supports it. I think starting from this week, this is the big news that we started from, and this is why I wanted to cover agent skills. windsurf has been supporting this for a minute, even though they call 'em something else, but fine. Yeah. It's, it's still skills, antigravity and non-support also this week. Uh, who else am I missing?
Eleanor Berger
Eleanor Berger 1:13:49
well, am first, I think one of the first ones to
1:13:51
adopt it, so shout out to them. Yeah.
Alex Volkov
Alex Volkov 1:13:53
we'll add Ryan, here super quick.
1:13:55
And so for coding agents in CLI, that's a broad support Claude code, obviously one of the major ones, but also amp, open code, pretty much everybody, codex, everyone supports agent skills. And I think when it clicked for me, and the reason why it clicked for me, why it's useful is that, like you said, these LLMs and now we're getting to crazy important levels like Opus 4.5 G, pt, 5.2 Codex. They are really, really good. Generally, at some point though, you need to steer them and give them domain expertise. Domain expertise is where it's at, where you as a developer, you get onboarded to the company, you learn a bunch of this domain expertise. So when you're expected to execute tasks for this company, you are expected to have this domain expertise, domain expertise doesn't come from anywhere and it's a big issue to give this domain expertise to generalized models like Opus, et cetera. And skills, is a very easy way to do this in a reproducible way, in a composable way. So let's start with the reproducible and composable and then we'll get to some examples. CC Eleanor, can you walk us through like what a skill is and how it looks like a very simple, basic, like example,
Eleanor Berger
Eleanor Berger 1:15:04
how it looks like it's kind of trivial, right?
1:15:05
It is a directory and it has a skill md, which is a markdown file and has a little bit of front meta, you know, a little bit of yamo at the top where you just say, this is the scale and this is when you should use it. That's enough for it to recognize when the time comes, oh, I should go grab that scale. See what it does, that's wonderful because it means that you're not stuffing your context with like tons and tons of rules. You're just saying, look, you have this library of things here that you may want to do at some point, and you could have hundreds of them because they take very little, maybe like 50 to hundred tokens per skill, just the metadata. and so you can have a huge library of them, some of them general, some of them specific to the project or whatever. And the agent, especially when you look at the last generation of models like Claude 4.5, GPT five, Gemini three, they'll figure it out. They'll know when the time comes to grab that scale and start looking inside it, read the markdown file, then look, maybe there are additional files, some reference files that it should read selectively, progressively, not everything it wants, but go and pick the stuff it needs. Maybe there's a script that it should run, it'll run the script, and so on and so forth.
Alex Volkov
Alex Volkov 1:16:26
So I'm gonna show the skills that I currently have, you're
1:16:30
welcome to talk about this 'cause you have, so I have a few skills, and I've copied the TRO one here. the skill creator skill, which I think is wonderfully self-reflecting. if you go to Claude and you ask, Hey, help me build a skill, it has a skill creator skill that actually knows how to do this. Here's the Skill MD file. It's a markdown file that has the ior, you mentioned this as a front matter. it's a little top thing at the top of the markdown file that says the name of the skill and the description. The description mentions the skills should be used when blah, blah, blah, blah, blah, blah. This is like a very standard thing at the top of a skill to mention when it should be used. And I think this is like a very important piece of that is because all of the context, of all of your skills is basically two lines. The name of the skill and when it should be used. And it's a description. Basically. This is it. This is important because when you have tens, hundreds of skills, if you were to provide all of this with the full code, as previously, your contact window would've run out. This is a very genius way to tell the model, Hey, you have this capability. If you wanna tap into this, in what's called a dynamic disclosure or progressive disclosure, depending on where you read, basically the model knows that it can tap into this intelligence, into this domain expertise about this specific, way of using things. The way that I define this is kind of like neo in the matrix when they plug them in and he is like, I know kung fu This is skills. In a nutshell, this is skills in a nutshell. the model decides to load, what information went, because then at some point it is like, okay, I will load this info and that info. Why is this important? It's important because the agents are generalized, but with skills, they can be more targeted in a very specific task when they need to. And I think that this is like the big unlock in addition, like Eleanor, you mentioned, there's three additional directories that they have one of them scripts and one of them is references, right? And another one is, I think assets, but I haven't seen assets in a while. So could you talk us through like the additional stuff that the skill can have in addition to just the skill M.
Eleanor Berger
Eleanor Berger 1:18:38
I mean, so this is a convention.
1:18:39
It is, it is just like directories with files, but at least we have an agreement that they are there. And where to find them. Scripts would be, scripts. Maybe you'll have a Python script or a type script is our most common, but it could be anything that can run on a computer, which is great because maybe you want to have something that is formalized as code and not just as a bunch of text in a markdown file. Or maybe you want to, I don't know, do some calculation that makes more sense to do in code or to call an API or whatever. So that's very convenient. references would be more marked on files. So again, this is the idea of progressive, loading. Instead of having everything in one big marked on file and sloping it into the context, you can say, okay, there are different aspects here. Maybe, I don't know, different, pages of the documentation of the library or something like this and load them if and when you need them. I'll describe them in the skilled md and if you think that you need one of them, go and load it. Assets would be just files. So for example, maybe you're templating something and you want the template to be available, or you have like an image that you always need to paste into something, you'll just put it in assets. So it's just a convention for packaging all of this together and we all agree on it. All the agents know how to use it, and so that makes it portable.
Ryan Carson
Ryan Carson 1:20:10
On that note, I wanted to point out what we're seeing is experts
1:20:13
in the field like Versal, who obviously know probably the most in the world about React and Xjs, they're starting to release sets of skills now, and this is what we all want as devs, right? Like, you know, I write mostly next Js TypeScript and I would prefer to use, off the shelf skills from Elle. so boom, you point your agent at this, which you're seeing on screen and it will install the skills for you. And then it's like you've leveled up your agent instantly on the primary topic that you care about, right? So, this is exciting.
Eleanor Berger
Eleanor Berger 1:20:44
I think that's great.
1:20:45
And maybe one more thing, Alex. 'cause you've shown the skill creator skill. Yeah. And I think that that's really cool because the thing people might imagine when they think of these skills and we're describing there's a directory structure and these files, they think, oh, now I have to like, create a directory and go and like edit all these files. agents are really good at creating agent skills. You bring the knowledge, you just say, here's what I know about this workflow, or this library, or what I'm trying to do here. And the agent will very handily create the skill for you. And so you can teach the agent stuff and it lands. And in fact, you know, there's been a lot of talk recently about this continual learning. How will we solve continual learning? It's solved. The problem is solved.
Alex Volkov
Alex Volkov 1:21:29
I think that, the highlight that for me when it's connected is,
1:21:33
there's not a lot of knowledge to be assumed on behalf of the user. I, as a user don't really need to know how to build skills. And this only clicked for me when I convinced my manager, shout out to Adrian to, to allow me to use Claude Max for a second. Because, Claude, the basic, basic web and Mac interface supports skills. And if you go here, into settings and you go into, capabilities, I think. And you scroll down, You have skills, repeatable, customizable instruction, and Claude can follow into any chat. And then you have example skills and the skill builder, internal comms, et cetera. But the cool thing is because Tropic has this skill enabled, it's kind of the composibility thing that we're talking about. All you have to do in order as a user to create one and say, Hey, create a skill for me that blah, blah, blah. it doesn't matter at this point. You, working in a company, for example, with salespeople, and you want to create a skill based on your sales metrics. You can dump 17 pdf files and say, Hey, create a skill for me that writes a sales script or listens to a call. Whatever you can do, whatever. this is a very easy way. This is like where I want like the click that happened to me to, to happen to somebody who's listening to this, like, I'm still not, not sure. Why do I need this? generalized agents are generalists. They perform significantly better when domain expertise is loaded into them. LMS are incredible at in context learning. However, for generalized agents, you cannot shove everything together at once. The context runs out. This is a way for you to progressively load those skills in so that the agents will be that much better for you. So if you have any area that your LLM has not been good at, like maybe it knows React because it was trained on it, but it's not the best on the latest React or whatever. Or maybe it doesn't know best practices, maybe it learned all of React from all of the internet. You can take the skills that Versal just released, which is the top ninjas in the world of React. Obviously they know the best about load up your LLM with that skillset and then look at your React app and say, Hey, now with this new skillset, review what I have and tell me what's wrong. Tell me how to improve. You can do this with sales, you can do this with marketing, you can do this with literally everything you can do with taxes. Everything else you bring the knowledge skills is making sure that the agent, the general agent, can actually execute on this. I think this is, this is the big, this is the big unlock. This is the big connection for me. Eleanor, anything to add that I missed there?
Eleanor Berger
Eleanor Berger 1:24:03
Yeah, maybe one more thing.
1:24:04
I think that that'll be more interesting to people who've been working a lot with coding agents over the last year and even more just to realize that, you know, they're like the joker card of customization because they replace everything. You really don't need anything else. So we used to have this very rich toolbox of commands and hooks and a million different things. You don't need commands anymore because skills are basically the equivalent of commands. You don't necessarily need hooks that much anymore because the agents are so good at following directions. You could just specify a workflow and when this, then that and they'll follow it. You don't need MCP service necessarily because you can just tell in a scale like, look, this is how you connect through this API or this is how you run something locally on my command line, and that's an adequate replacement and lighter. You don't need apps. So a lot of the things I really like to do is scale. That's something that Norm, like previously I would've built a little app for myself, but why bother if I just have some of the functionality and I'll dump it in the scale and I'll tell it. You know, the interface will be will chat or you'll present some questions to me. Now I have like my little tool and encapsulated in scale. You really, skills are all you need. They know they're just a little it's not corny at all skills. All you
Alex Volkov
Alex Volkov 1:25:25
need, I, I love this, and this's, why I brought you Eleanor.
1:25:27
'cause you've been highly focused on skills. the thing that I'll add, that also clicks for me for code, right? So we previously owned the show, talked about coding agents or generalized agents, right? So you have Claude code, and many people use Claude code to do their, browsing, taxes, et cetera, Claude code. The reason why it's generalized is because it can run code to basically execute everything that it needs. Here is the thing that clicked for me. When you add code in a skill like this, so I have the skill creator, it has in it skill P, packaged skill P, and quick validate P, right? these are three pieces of code that we know works because we ran them before we package them in the skill. When you have a skill with a code that's like dedicated to this, like one API code or like one specific thing, the agent doesn't have to write codes from scratch every time. So the generalized agent that technically can write this code, 'cause it can look up the, the, the thing on the, on the internet, it can know how to write this. It doesn't have to write every time. It can just reuse the tool. this reduces variance 'cause not agents hit all the code at once. This reduces debugging, this reduces like a bunch of stuff. When you package your skill with the specific tools it needs, like API tools, et cetera, they will repeatedly work versus agents trying to come up with, different things from scratch. I think this is like a big unlock as well. I'm gonna add ham because ham, we, we both talked about skills. I think you're also like, took a look at them. You got excited about your skills working in, into antigravity this week. what's your take on this whole thing? what else is a point where people who are listening to us may say, ah, this is too complex for me. I don't wanna get into this. This is too complex. And they need to, because then they will click and the life will be significantly better.
Wolfram Ravenwolf
Wolfram Ravenwolf 1:27:03
For me, initially it was the other way around.
1:27:05
So I read about skills when Philanthropic announced it, and I thought, oh, so what? I have my skills and my system prompt already. So that's how I got started. I had some stuff like I want my eye to take a screenshot of my desktop so it could see what I'm working on, and I put it in my system prompt, And that is exactly what happens. The context blew up, had so many tools, or not tools, descriptions of how to use tools and stuff in there. And now we have a standard, an easy standard. It's not complicated. You have the skill builder, you can just follow the standard. And the cool thing is the harnesses are being set up for using the skills. So they auto ingest them. You don't have to prompt the agent and tell it in your system prompt, Hey, there are skills. Like I'm using the Claude skills with anti-gravity. It works beautifully. And yeah, it's like El said about apps. You could just go ahead and write your scripts and do some apps and stuff like that. I, one thing I implemented with a skill is a to-do list manager. So I give it a screenshot or it takes a screenshot for me with another skill and then it, manages my to-dos, puts them in the proper format. All of this information, how to do that. I put it in a skill. And the AI has the description, how to ask, to-dos, how to manage to-dos. And it knows by itself if I give it a screenshot, it looks at it, it says, ah, you can add it to your to-do list. So like you could have written a to-do list app, but now you just give it your general purpose agent. You give it the skills and suddenly it can do all these things. So, yeah. That's, so thanks Wolfram.
Alex Volkov
Alex Volkov 1:28:34
Eleanor, I want to get to maybe one last question from you
1:28:36
in terms of like maybe examples. Maybe examples will hit for folks who are not like connecting yet. You've built a few skills for yourself as well. I'll just like repeat these three things that connected for me. This is simple, simple to build, simple to add domain knowledge to your agents. It's, dynamically discoverable so the agent itself can upload, I know con fu type stuff when it needs to, so you don't pollute the context. They're composable, like Wolfram said, you can have multiple skills doing very small things, and then you kinda build a whole workflow saying, Hey, this skill, then this skill, then this skill. And basically just because of the description, the LLMs will be able to just like pick them up and then use them supposedly. And then you can build like very serious, very, the main expertise type stuff where you're like, you need to know this and that and this and that. they are open source and loadable. You can like, because it's only directors and files, you can go to Versal, download theirs, and now your agent is the best react agent in the world because they expose their expertise. There's ones for ui, et cetera. Just go look for them. There's a bunch of directors. the code is always working. Like the code that is included within the skill is not the code that a LM just written just now. So you have to debug it on the fly. It's a code that like somebody already tested and it works. So it's always, repeatable and working. all right. Let's finish on, a few strong, maybe,
Eleanor Berger
Eleanor Berger 1:29:50
I'll add one more benefit, which, again, it's kind
1:29:52
of trivial and yet super exciting. Scales, portable between different agents and different models. Yes,
Alex Volkov
Alex Volkov 1:29:59
Assuming
Eleanor Berger
Eleanor Berger 1:29:59
that these are agentic models, you know, the
1:30:01
ones that really can execute these things And are forwards compatible. no model in the future will be worse than the models we have now. It interpreting and understanding the instructions. we don't have to think of like, what exact API version, did they change it to responses? No, because it's just text files and that's actually really powerful.
Alex Volkov
Alex Volkov 1:30:22
That's very powerful.
1:30:23
And, Eleanor, so you said the magic word for me to cue up kind of what I wanted to show, but meanwhile I wanted to give you like two minutes to highlight, discuss that you built for folks to maybe go and check them out, but also to show us why they were important for you to build. And then I'll show, some of the exciting that I built this week as well using Ralph.
Eleanor Berger
Eleanor Berger 1:30:38
Yeah.
1:30:38
Cool. So yeah, first of all, go to agentic-ventures.com. I share a lot also on, on XI usually share like, gist of new skills like a few times a day, just when I have a cool idea. some cool things I've done. So the thing that really convinced me like a few months ago when On Tropic did it, I had like this app that I actually coded. It was before adjunct coding, for doing flashcards with ai because I, I am a bit of a flashcard that, and I was like, just to amuse myself for like, what would happen if I give it the app and tell it to make it a skill. So I just like took all the files, all the code from the app and dumped it into a Claude the chat bot and told you, Hey, can you make a skill out of it? And this is it. I replaced app. I never used the app anymore because I have a skill. It took like 10 minutes to create. this convinced me, this is when I got skill held. Since then, I've done some tools, like I have a skill for Nano Banana and GPT one five so that I can generate images. I have, skills that are kind of replacement for MCP. So I used to use, market Down a lot and I realized I don't need an MCP, I can just run the tool with a scale. I have, one that connects to App Stash and I use it as a key value store for many different tools, like little tools and processes that I built. so things like this, I drive all the models. I have GitHub, copilot, CLI, and I can run Codex and then Gemini and I can drive it from Claude So it's kind of nice when I need a code review. lots of skills specific to a project. Like, recently from my recent course, I figured out a nice way to generate light with a combination of slide, banana. There's a skill in the repo for that that I use. The other thing, you check out a little bit is these little video tutorials I have, I think the important thing there is not to have a specific skill you can download, but to get the concept, that you can teach by doing. So, one way to create a skill that I really like doing is instead of telling it, build me a skill that does this. Just doing the thing like I do, like, hey, let's, look at the CSV file and help me create a report and let's change a little bit. I actually want a pie chart instead of a bar chart. And after all of that, I'm telling it, okay, now build a skill that does what we've just done here because I want to do it with additional files. And that works great.
Alex Volkov
Alex Volkov 1:32:59
So I think there's a lesson here for sure.
1:33:02
thank you so much for joining us and shining the light on skills, the thing that you got built, super quick. There's a lesson here, and I've seen many people say, the best way to build skills to get started is if you use an AI a lot, like a specific AI that has access to its chat review all my conversations with you. Here's the skill spec. You just send the skill website itself and say, build me a skill, that can, you know, for stuff that I do repeatedly with you. And then the AI will find out itself what you do repeatedly. I have to create the infographic stuff. I have to, summarize news in a specific way. All of these things that will now turn into skills I haven't yet, but I will. Eleanor, thank you so much. I wanna bring everybody else here to kinda, to, to, to talk about Ralph and skills and everything. So in preparation for Thursday, I started to write like water skills just for me, just to make sure that I'm not missing any part of it. I read Eleanor stuff or read like a bunch of other people's stuff, like, okay, I need to like. Ramp up on, I need to, I need to, I wish I had the ability to use skills, I would just upload everything. But no, humans don't work this way. So I actually need to read stuff. So I started writing stuff. The best way for me to learn is writing stuff. And then at some point I stopped. I was like, how to use skills. My main resistant point to using skills was, first of all, I didn't understand what they were. I, I had control, whatever. And second of all, I wanted them in the chat interface with Claude. I didn't wanna go to Claude code, the CLI, et cetera. I wanted them in chat interface. And the only chat interface as far as I know to support skills is Claude the web that you need to pay for. I haven't paid for it until this last week. And now I am a Max subscriber, at least for this month. And then I was like, can I use skills with other chat things? No. Yes, no. And then remember last week, our friend Ryan Carson here, who drinks coffee, taught me about Ralph, And Ralph is a way to have your agents code anything that you want. And then I also remembered that I have this app, it's called Chorus. chorus is an app that lets you talk with any LLM that you want and compare the answers between three models. So it works with open router. Right now I have, for example, codex and Gemini and Cloud Opus, and it uses the API behind the scenes, To kind of talk to them. So it's not entirely the Claude but it is very, very close. And then I remember that chorus just recently went open source because the guys who built it are focusing on something else entirely. And then I remember that I have Claude code, max subscription, and I know Ralph, and basically I know what I need. this is the big reveal. I've added skill support to chorus, and now you can use skills, the same skills that you have already installed, with Chorus on every LLM out there. So GPT 5.2 Cortex, the one that was released yesterday. You can now use it with your own skills in a chat interface. you could obviously use it in Codex and all the added support. when I say I added support, This is me defining what I want and building a Ralph loop for a Claude code to actually go and build everything. And when I say everything, it took three and a half hours. you can see the new settings panel here in skills, including the icon that it chose completely. You can see the skills themselves with a refresh button being loaded from my file system. The director is for skills. You can see kind of the extraction of the markdown, front matter, which describes what skills are. And now with the skills, the skills are also enabled. So now you can ask what skills do you have? And these models, obviously opus is skill filled like we are. But here's an example. GBT 5.2 Codex a model that was just yesterday, knowing that it has a daily prep skill and a skill creator skill and a thumbnail creator skill. And a Thursday I block writer skill. And all of these skills are now cross supported for all models. And you can ask, and you can see the comparisons. You can see the models answer differently with different speeds. and you can install this right now if you want to use skills and you don't have the Claude subscription and, you want to run or test out skills with a skill creator, you can use this right now. I will add the link to the show notes. but basically it's on my GitHub. It's not yet in the official, app. It wasn't accepted, but three and a half hours, like four or five different UI components, extraction of skills, understanding of how to inject them in the context, all of this. and now we have the ability to run skills with any of them. this is the way that I learned skills, but using skills.
Ryan Carson
Ryan Carson 1:37:10
So, question on this.
1:37:11
Yeah. Uh, now that, now that you are a skill pro, and you should probably sell these, um, is are we gonna move to a skill marketplace? 'cause I actually spent 200 bucks on a set of skills from the boring marketer, uh, and it was worth every single little penny, um, of that. So I'm kind of curious how skills are gonna, are, are gonna shape out, uh, El Illinois.
Alex Volkov
Alex Volkov 1:37:30
Yep.
1:37:31
I think, I'll, I'll finish on this. I think, a lot of companies. Already understanding why skills are very important. They're easy to manage. You can turn your docs into skills, et cetera. You can like build a AI agent that are the main experts in internally. You can share skills via git. That's also something we should mention, that skills can be, global to the user or local to the directory, So every product that you have can have its own skills and how to build this directory, how to test it, et cetera. I think that, the marketplaces will come, but also many people will just do this for free. Like people do it now with N 10 workflows where people understand that you know, domain expertise is not enough. Folks. this is the reason why I was super excited about the show. I was able to use skills, let's call them agent skills. we have actual skills. I was able to use the skills that I learned on the show last week to develop this thing that I think if I were at the startup, doing traditional software development, this would've taken a week and a half. Planning out doing the PID writing the settings plan, talking to developers, talking to designer, like a whole thing. This just happened in like three hours. By using the skills that we kind of bring you here at the edge of agent coding, it's incredible. The time that we have to live in is incredible. I'm very, very happy that we are, dedicated to knowing everything that happens and bringing you everything that happens, every week. skills is a big thing that I really wanted to for quite a while. I, as we were MCP skilled last year, now we're skill built as well. And there's a big appreciation in my heart for everybody who comes and shares their knowledge and gets people excited. And hopefully some of you who were not skill peeled before this week's show are now are.
Ryan Carson
Ryan Carson 1:38:58
Alex, I'm gonna give you an idea that I think
1:39:01
will make you a lot of money. I think you're very good at preparing for podcasts. I think you're very good at producing podcasts and I feel like you should make some skills and we will all buy them. Alright. Please do that. Thank you.
Alex Volkov
Alex Volkov 1:39:12
I have a bunch of skills I need to prepare, for the show, but the
1:39:15
one skill that I do have is finishing ThursdAI with a big end of show credit. So folks, very exciting week in ai. we just finished, talking about agent skills, but before we talk about GPT 5.2 Codex and we talked about Tropic Cowork, which is a new offering for non-techy people to run code on their computer. There's a lot more in open source that we've covered for GLM and the skills from per labs like Versal that can uplift your agents to new heights of technology skills that they have. if you missed any part of the show. And if you tuned in live, everything is recorded and edited and boiled down to a newsletter and a podcast that will release on ThursdAI I really appreciate everybody here right? Ryan Carson, LDJ, Wolfram, Raven Wolf, Yam Peleg. And then we have Eleanor Berger joining us as a deep dive into agent skills. As always, ThursdAI is available everywhere where you get your podcast. It's free. if you wanna support the podcast, the best way to do so is well, you can support on substack for six bucks a month if you want to, but also to just share it with your friends and give us a five star rating everywhere where you listen to a podcast. It's been great. we'll see you here next week with a very, very exciting thing. I think that next week I'm going to tell you about Clawd Bot. I'm coming very close to tell you about this incredible mind blowing thing that none of you know about yet. Claude Bott is that thing, and it is just really, really something. So look forward for next week. It's gonna be also exciting. Thank you so much. please, if you're in San Francisco, come join our hackathon. Would love to see you there. And with that, have a great week and look forward to the rest of the show recorded. Bye-bye everyone. Bye-bye.