Episode Summary
This week's ThursdAI went deep on agent skills โ the open standard that's turning general-purpose AI agents into domain experts with nothing more than markdown files and a directory structure. Eleanor Berger from Agentic Ventures joined for a masterclass on skills, while Alex demoed adding skill support to the Chorus app in just 3.5 hours using a Ralph loop. The show also covered Claude Cowork (a week-and-a-half sprint, 100% written by Claude Code), GPT 5.2 Codex hitting the API where Cursor used it to build a full browser from scratch with 330,000 commits, and Google rolling out Gemini personalized intelligence across Gmail, YouTube, and Search.
In This Episode
- ๐ฐ TL;DR
- ๐ Open Source AI Models
- ๐ MedGemma
- ๐ฐ Drama Corner & Partnerships
- ๐ ๏ธ Claude Cowork
- ๐ข GPT 5.2 Codex
- ๐ข Gemini Personal Intelligence
- ๐ค Agent Skills Deep Dive
- ๐ค Skills Adoption & Platform Support
- ๐ค What is a Skill? Structure Explained
- ๐ค Scripts, References & Assets
- ๐ค Creating Skills with AI
- ๐ค Practical Examples & Use Cases
- ๐ ๏ธ Demo: Adding Skills to Chorus
- ๐ค Future of Skills: Marketplaces & Sharing
Hosts & Guests
By The Numbers
๐ฅ Breaking During The Show
๐ฐ TL;DR
Alex opens the show with a call to non-developers to dive into AI agents, introduces the panel, and runs through a packed week: open-source medical LLMs, Claude Cowork launch, GPT 5.2 Codex in the API, Gemini personal intelligence, drama between Anthropic and Open Code, and a deep dive into agent skills with guest Eleanor Berger.
- Agent skills deep dive announced as the main topic
- Claude Cowork launched for non-technical users
- GPT 5.2 Codex finally released via API
- Gemini personal intelligence across Google services
๐ Open Source AI Models
The panel covers open-source releases: Byteโs M3, a 235B parameter medical LLM fine-tuned from Qwen3 that claims to beat GPT 5.2 on HealthBench, plus Anthropic and OpenAI both pushing into healthcare with HIPAA-ready products. Nisten highlights M3 can run on an M3 Ultra at usable speeds.
- M3: 235B medical LLM, Apache 2.0, beats GPT 5.2 on HealthBench
- 22B active parameters โ runnable on M3 Ultra
- Anthropic launches Claude for Healthcare with HIPAA compliance
๐ MedGemma
Google releases MedGemma 1.5 for medical use cases while Nisten and Wolfram clarify itโs a completely different model class (4B for imaging) that pairs well with the much larger M3. Also covered: OpenAI acquiring Torch Health and Anthropicโs Claude achieving 92% on Med Agent Bench with Opus 4.5.
- MedGemma 1.5: small enough for offline medical imaging
- Opus 4.5 hits 92% on Med Agent Bench
- OpenAI acquires Torch Health for GPT Health
๐ฐ Drama Corner & Partnerships
Spicy industry news: Thinking Machines co-founders return to OpenAI, Soumith Chintala becomes their CTO. Anthropic blocks Open Code from using Max subscription as a wrapper and blocks xAI from using Claude Code. Apple announces Gemini will power Siri. OpenAI inks a $10B deal with Cerebras for 2028.
- Anthropic blocks Open Code and xAI from Claude services
- Apple partners with Google โ Gemini to power Siri
- OpenAI ร Cerebras: $10B for 750MW compute (2028)
- Thinking Machines co-founders return to OpenAI
๐ ๏ธ Claude Cowork
Anthropic launches Claude Cowork โ Claude Code for non-developers, built in a week-and-a-half sprint with 100% of the code written by Claude Code itself. Alex demos it live, adding Flux Klein support to an image extension project without seeing a single line of code. The panel discusses the security implications and the dangerously-skip-permissions debate.
- 100% coded by Claude Code in a 1.5-week sprint
- Research preview, Mac-only, requires Max subscription
- Chrome connector enables browser automation
- Live demo: added Flux model support without viewing code
๐ข GPT 5.2 Codex
OpenAI finally releases GPT 5.2 Codex via API after months of exclusivity in the Codex app. Cursor used it to build a complete browser from scratch in Rust with 330,000 commits and hundreds of concurrent agents. LDJ and Ryan debate context compaction โ Ryan drops the hot take that compaction doesnโt work and atomic Ralph-style tasks are the real solution.
- GPT 5.2 Codex now in Cursor, GitHub Copilot, and VS Code
- Cursor built a browser from scratch: ~3M lines of Rust
- Native context compaction support for long sessions
- Ryan's hot take: auto compaction doesn't work
๐ข Gemini Personal Intelligence
Google ships personalized AI in Gemini, reasoning across Gmail, YouTube, Photos, and Search with explicit opt-in. Alex tests it โ it figured out he drives a Tesla Model Y from emails and noticed his recent Honda Odyssey search. The panel discusses Googleโs massive data moat and LDJ predicts MCPs for everything.
- Gemini reasons across Gmail, YouTube, Photos, Search
- Explicit opt-in for US Pro and Ultra users
- Googleโs data moat vs OpenAI and Anthropic
- LDJ: MCPs for everything, cross-platform personal AI
๐ค Agent Skills Deep Dive
Eleanor Berger from Agentic Ventures joins to kick off the skills deep dive. She explains that skills are an admission that we now have general-purpose agents โ they do everything except know what you want. Skills are the missing piece: simple markdown files in a directory that give agents domain expertise via progressive disclosure.
- Skills = admission we have general-purpose agents
- Simple markdown + directory structure, universally adopted
- Progressive disclosure: agents load skills on demand
- Every major coding agent now supports the standard
๐ค Skills Adoption & Platform Support
Alex walks through the current adoption landscape: Claude is the only chat interface supporting skills, but virtually every coding IDE (Cursor, Windsurf, Anti-Gravity) and CLI (Claude Code, AMP, Open Code, Codex) now supports the standard. Eleanor gives a shout-out to AMP as one of the first adopters.
- Cursor, Anti-Gravity, and Gemini CLI added support this week
- AMP was one of the first adopters
- Skills work cross-platform: same skills, any agent
๐ค What is a Skill? Structure Explained
Eleanor walks through the anatomy of a skill: a directory with a skill.md file containing YAML front matter (name + description of when to use it). The magic is that each skill takes only 50โ100 tokens of metadata, so you can have hundreds without polluting context. Alex compares it to Neo in The Matrix: the model decides when to load domain knowledge.
- A skill is a directory with skill.md + optional scripts/references
- 50โ100 tokens per skill metadata โ hundreds fit in context
- Progressive disclosure: agent loads full skill only when needed
- Skill creator skill: self-reflecting AI that builds skills
๐ค Scripts, References & Assets
Eleanor explains the three optional directories in a skill: scripts (Python/TypeScript code for API calls or computations), references (additional markdown for progressive loading), and assets (templates, images, static files). Ryan highlights that experts like Vercel are now releasing skill packs for frameworks like Next.js and React.
- Scripts: runnable code for APIs, calculations, tools
- References: additional markdown loaded on demand
- Vercel releasing official Next.js/React skill packs
๐ค Creating Skills with AI
Eleanor reveals the key insight: you donโt have to manually create skills โ agents are really good at building them. She argues this solves continual learning: teach by doing, then tell the agent to package what you just did as a reusable skill. Alex explains that Claudeโs chat interface supports skills directly for Max subscribers.
- Agents can create skills from your workflows
- "Continual learning? It's solved. The problem is solved."
- Teach by doing: work with the agent, then package as skill
- Claude web/Mac chat supports skills for Max subscribers
๐ค Practical Examples & Use Cases
Eleanor shares her skills portfolio: flashcard apps turned into skills, image generation via Nano Banana, MCP replacements, and driving multiple models from Claude. Wolfram describes his to-do list manager skill and screenshot-based workflows. Eleanor drops the key insight: skills are the joker card of customization โ they replace commands, hooks, MCPs, and even small apps.
- Eleanor replaced a full app with a 10-minute skill
- Skills can replace MCP servers, hooks, and commands
- Wolfram: to-do list manager built entirely as a skill
- Skills are portable between different agents and models
๐ ๏ธ Demo: Adding Skills to Chorus
Alex reveals his big project: he used a Ralph loop with Claude Code to add full skill support to Chorus, an open-source app that compares answers across multiple LLMs. In 3.5 hours, Claude built a settings panel, skill discovery from the filesystem, front-matter extraction, and cross-model skill injection โ making skills work with GPT 5.2 Codex, Gemini, and every Open Router model.
- 3.5 hours via Ralph loop to add full skill support
- Skills now work across any LLM via Chorus + Open Router
- Settings UI, filesystem discovery, and front-matter parsing
- GPT 5.2 Codex using Claude-style skills for the first time
๐ค Future of Skills: Marketplaces & Sharing
Ryan asks if weโre heading toward a skill marketplace โ he already spent $200 on skills from The Boring Marketer. Alex predicts a mix: companies turning docs into skills, free community-shared skill packs via Git, and paid specialist collections. Ryan closes by telling Alex to sell his podcast production skills.
- Ryan spent $200 on marketing skills pack โ worth it
- Skills shareable via Git, local per project or global per user
- Skill marketplaces coming alongside free community sharing
- WeaveHacks 3 announced: Jan 31โFeb 1, Self-Improving Agents
Hosts and Guests
Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldjconfirmed @ryancarson
Vaibhav Srivastav (VB) - DX at OpenAI ( @reach_vb )
Open Source LLMs
Z.ai GLM-OCR: 0.9B parameter model achieves #1 ranking on OmniDocBench V1.5 for document understanding (X, HF, Announcement)
Alibaba Qwen3-Coder-Next, an 80B MoE coding agent model with just 3B active params that scores 70%+ on SWE-Bench Verified (X, Blog, HF)
Intern-S1-Pro: a 1 trillion parameter open-source MoE SOTA scientific reasoning across chemistry, biology, materials, and earth sciences (X, HF, Arxiv, Announcement)
StepFun Step 3.5 Flash: 196B sparse MoE model with only 11B active parameters, achieving frontier reasoning at 100-350 tok/s (X, HF)
Agentic AI segment
Big CO LLMs + APIs
OpenAI launches Codex App: A dedicated command center for managing multiple AI coding agents in parallel (X, Announcement)
OpenAI launches Frontier, an enterprise platform to build, deploy, and manage AI agents as โAI coworkersโ (X, Blog)
Anthropic launches Claude Opus 4.6 with state-of-the-art agentic coding, 1M token context, and agent teams for parallel autonomous work (X, Blog)
OpenAI releases GPT-5.3-Codex with record-breaking coding benchmarks and mid-task steerability (X)
This weeks Buzz - Weights & Biases update
Links to the gallery of our hackathon winners (Gallery)
Vision & Video
xAI launches Grok Imagine 1.0 with 10-second 720p video generation, native audio, and API that tops Artificial Analysis benchmarks (X, Announcement, Benchmark)
Kling 3.0 launches as all-in-one AI video creation engine with native multimodal generation, multi-shot sequences, and built-in audio (X, Announcement)
Voice & Audio
Mistral AI launches Voxtral Transcribe 2 with state-of-the-art speech-to-text, sub-200ms latency, and open weights under Apache 2.0 (X, Blog, Announcement, Demo)
ACE-Step 1.5: Open-source AI music generator runs full songs in under 10 seconds on consumer GPUs with MIT license (X, GitHub, HF, Blog, GitHub)
OpenBMB releases MiniCPM-o 4.5 - the first open-source full-duplex omni-modal LLM that can see, listen, and speak simultaneously (X, HF, Blog)
AI Art & Diffusion & 3D