ThursdAI · September 4, 2025

📆 ThursdAI - Sep 4 - Codex Rises, Anthropic Raises $13B, Nous plays poker, Apple speeds up VLMs & more AI news

From Weights & Biases, back in the studio after a short break, with great OSS AI news, 2 interviews (NousResearch & Kwindla Kramer) and an exciting coding CLI discussion

By Alex Volkov

98 min

YouTube Spotify Apple Podcasts Substack

What happened in AI the week of September 4, 2025?

Back in the studio after a short break, ThursdAI returns with a builder-heavy episode spanning Codex, open models, agent benchmarks, voice systems, and a very loud funding/news cycle. Roger Jin and Bhavesh Kumar join from Nous Research, Kwindla Hultman Kramer covers the voice side, and the panel keeps circling back to one principle: tool reliability matters as much as raw model power.

🔓 Codex Returns and the Open-Source Board
🧪 Nous Research and Husky Bench
🔊 Pipecat, Real-Time Voice, and Kwindla
💰 Embeddings, Speech, Fundraising, and Frontier News
🛠️ Codex vs. Cloud Code and the Reliability Question

Hosts & Guests

Alex Volkov

Host · W&B / CoreWeave

@altryne

Roger Jin

Researcher · Nous Research

@rogershijin

Bhavesh Kumar

AI Researcher · Nous Research

@bha_ku21

Kwindla Hultman Kramer

Co-Founder & CEO · Daily.co

@kwindla

Yam Peleg

AI builder & founder

@Yampeleg

LDJ

Nous Research

@ldjconfirmed

Nisten Tahiraj

AI operator & builder

@nisten

🔓 Codex Returns and the Open-Source Board

The episode opens with the energy of a show that has been away just long enough for the backlog to pile up. Codex, GPT-5 chatter, and fresh open-model releases all show up immediately, but the panel keeps the conversation focused on what feels durable versus what feels noisy.

The return episode opens with real momentum
Codex is treated as a practical workflow story, not just a headline

🧪 Nous Research and Husky Bench

Roger Jin and Bhavesh Kumar give the show one of its strongest technical segments by grounding the benchmark discussion in actual evaluation goals. The Husky conversation works because it is about measuring agent behavior in richer environments, not just squeezing out another leaderboard point.

The Nous segment is focused on practical evaluation, not hype
Husky Bench gives the episode a strong agents-and-evals spine

🔊 Pipecat, Real-Time Voice, and Kwindla

Kwindla Hultman Kramer pushes the show into the real-time communication layer. The conversation connects voice agents, infrastructure, and developer ergonomics in a way that complements the benchmark-heavy earlier sections and makes the episode especially useful for builders.

Voice agents are treated as systems problems, not demo toys
Kwindla links low-latency interaction to real developer decisions

💰 Embeddings, Speech, Fundraising, and Frontier News

The big-company portion of the episode runs through Embedding Gemma, speech models, fundraises, and a wider sense that capital and capability are accelerating together. Anthropic's round and OpenAI's fundraising chatter give the news section real weight.

The funding story becomes inseparable from the product story
Speech and embedding launches keep the frontier section grounded in shipping tools

🛠️ Codex vs. Cloud Code and the Reliability Question

The episode closes on a very ThursdAI note: not who had the splashiest launch, but which tools can actually be trusted. The Codex-versus-Cloud-Code framing and the discussion around fast VLMs bring the show back to builder reality and day-to-day workflow choices.

Reliability is the core theme of the closing segment
The final debate stays focused on tools people can use this week

TL;DR and Show Notes

Hosts and Guests
- Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
- Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldjconfirmed
- Guests - Roger Jin - @rogershijin & Bhavesh Kumar @bha_ku21
- Kwindla Kramer - @kwindla
Open Source LLMs
- Nous Hermes 4 — 14B launches: compact hybrid reasoning model with tool calling for local and cloud use (X, HF, Tech Report)
- Tencent open-sources Hunyuan-MT-7B translation model after sweeping WMT2025 (X, HF)
- Nous - Husky Hold'em Bench launches as an open-source pokerbot eval for LLM strategic play (X, Bench)
- WebWatcher: Alibaba's Tongyi Lab open-sources a vision-language deep research agent that sets new SOTA (X, HF)
- Apertus-8B and 70B launch as Switzerland's fully open, multilingual LLMs trained on 15T tokens across 1,800+ languages (X, HF)
- Google releases Embedding Gemma - 300M param SOTA embeddings model for RAG ([Breaking News])
Big CO LLMs + APIs
- Mistral adds 20+ MCP-powered connectors and controllable Memories to Le Chat for enterprise workflows (X, Blog)
- Anthropic raises $13B Series F at a $183B post-money valuation (X, Blog)
- OpenAI fundraises $10B at ~$500B valuation - buyback for employees
- OpenAI ships gpt-realtime and takes Realtime API to GA with remote MCP tools, image input, and SIP phone calling (X)
- OpenAI releases projects for free users with larger file uploads and project-only memory controls
- OpenAI acquires Statsig & Alex for $1.1B+ to strengthen applications team
- Grok Code 1 - now taking 50% of coding traffic on OpenRouter
- Codex usage up 10x in 2 weeks per Sam Altman, with improvements coming
- Anthropic admits to Claude Opus quality degradation for 3 days due to infrastructure changes
This weeks Buzz
- CoreWeave buys OpenPipe! 🎉 (Blog)
Vision & Video
- Apple's FastVLM-7B lands with speed-first vision encoder—85x faster TTFT vs peers (X, HF)
AI Art & Diffusion & 3D
- Nano Banana (Imagen 3) continues to dominate as Google's best image model (ai.studio/banana)
Tools
- Codex vs Claude Code discussion → Codex now significantly better with GPT-5 engine, GitHub PR reviews, and cloud agents

Alex Volkov 0:27

Hello, hello, hello and welcome everyone to