Episode Summary

Back in the studio after a short break, ThursdAI returns with a builder-heavy episode spanning Codex, open models, agent benchmarks, voice systems, and a very loud funding/news cycle. Roger Jin and Bhavesh Kumar join from Nous Research, Kwindla Hultman Kramer covers the voice side, and the panel keeps circling back to one principle: tool reliability matters as much as raw model power.

Hosts & Guests

Alex Volkov
Alex Volkov
Host ยท W&B / CoreWeave
@altryne
Roger Jin
Roger Jin
Researcher ยท Nous Research
@rogershijin
Bhavesh Kumar
Bhavesh Kumar
AI Researcher ยท Nous Research
@bha_ku21
Kwindla Hultman Kramer
Kwindla Hultman Kramer
Co-Founder & CEO ยท Daily.co
@kwindla
Yam Peleg
Yam Peleg
AI builder & founder
@Yampeleg
LDJ
LDJ
Nous Research
@ldjconfirmed
Nisten Tahiraj
Nisten Tahiraj
AI operator & builder
@nisten

๐Ÿ”“ Codex Returns and the Open-Source Board

The episode opens with the energy of a show that has been away just long enough for the backlog to pile up. Codex, GPT-5 chatter, and fresh open-model releases all show up immediately, but the panel keeps the conversation focused on what feels durable versus what feels noisy.

  • The return episode opens with real momentum
  • Codex is treated as a practical workflow story, not just a headline

๐Ÿงช Nous Research and Husky Bench

Roger Jin and Bhavesh Kumar give the show one of its strongest technical segments by grounding the benchmark discussion in actual evaluation goals. The Husky conversation works because it is about measuring agent behavior in richer environments, not just squeezing out another leaderboard point.

  • The Nous segment is focused on practical evaluation, not hype
  • Husky Bench gives the episode a strong agents-and-evals spine

๐Ÿ”Š Pipecat, Real-Time Voice, and Kwindla

Kwindla Hultman Kramer pushes the show into the real-time communication layer. The conversation connects voice agents, infrastructure, and developer ergonomics in a way that complements the benchmark-heavy earlier sections and makes the episode especially useful for builders.

  • Voice agents are treated as systems problems, not demo toys
  • Kwindla links low-latency interaction to real developer decisions

๐Ÿ’ฐ Embeddings, Speech, Fundraising, and Frontier News

The big-company portion of the episode runs through Embedding Gemma, speech models, fundraises, and a wider sense that capital and capability are accelerating together. Anthropic's round and OpenAI's fundraising chatter give the news section real weight.

  • The funding story becomes inseparable from the product story
  • Speech and embedding launches keep the frontier section grounded in shipping tools

๐Ÿ› ๏ธ Codex vs. Cloud Code and the Reliability Question

The episode closes on a very ThursdAI note: not who had the splashiest launch, but which tools can actually be trusted. The Codex-versus-Cloud-Code framing and the discussion around fast VLMs bring the show back to builder reality and day-to-day workflow choices.

  • Reliability is the core theme of the closing segment
  • The final debate stays focused on tools people can use this week
TL;DR and Show Notes
  • Hosts and Guests

  • Open Source LLMs

    • Nous Hermes 4 โ€” 14B launches: compact hybrid reasoning model with tool calling for local and cloud use (X, HF, Tech Report)

    • Tencent open-sources Hunyuan-MT-7B translation model after sweeping WMT2025 (X, HF)

    • Nous - Husky Hold'em Bench launches as an open-source pokerbot eval for LLM strategic play (X, Bench)

    • WebWatcher: Alibaba's Tongyi Lab open-sources a vision-language deep research agent that sets new SOTA (X, HF)

    • Apertus-8B and 70B launch as Switzerland's fully open, multilingual LLMs trained on 15T tokens across 1,800+ languages (X, HF)

    • Google releases Embedding Gemma - 300M param SOTA embeddings model for RAG ([Breaking News])

  • Big CO LLMs + APIs

    • Mistral adds 20+ MCP-powered connectors and controllable Memories to Le Chat for enterprise workflows (X, Blog)

    • Anthropic raises $13B Series F at a $183B post-money valuation (X, Blog)

    • OpenAI fundraises $10B at ~$500B valuation - buyback for employees

    • OpenAI ships gpt-realtime and takes Realtime API to GA with remote MCP tools, image input, and SIP phone calling (X)

    • OpenAI releases projects for free users with larger file uploads and project-only memory controls

    • OpenAI acquires Statsig & Alex for $1.1B+ to strengthen applications team

    • Grok Code 1 - now taking 50% of coding traffic on OpenRouter

    • Codex usage up 10x in 2 weeks per Sam Altman, with improvements coming

    • Anthropic admits to Claude Opus quality degradation for 3 days due to infrastructure changes

  • This weeks Buzz

    • CoreWeave buys OpenPipe! ๐ŸŽ‰ (Blog)

  • Vision & Video

    • Apple's FastVLM-7B lands with speed-first vision encoderโ€”85x faster TTFT vs peers (X, HF)

  • AI Art & Diffusion & 3D

    • Nano Banana (Imagen 3) continues to dominate as Google's best image model (ai.studio/banana)

  • Tools

    • Codex vs Claude Code discussion โ†’ Codex now significantly better with GPT-5 engine, GitHub PR reviews, and cloud agents