Everything AI Released in November 2025

50 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

← October 2025 All months December 2025 →

🧠 New Models 29

Alibaba (Tongyi) Nov 27, 2025

New ModelsOpen weights

Z-Image Turbo

Tongyi's Z-Image Turbo brings sub-second open image generation

Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.

6B Parameters

Z-Image Turbo on HuggingFace ↗Z-Image on GitHub ↗

🎙️ Hear our coverage →

#image-gen #open-source #architecture

Anthropic Nov 27, 2025

New Models

Claude Opus 4.5

Anthropic launches Claude Opus 4.5, reclaiming the coding crown

Anthropic released Claude Opus 4.5, scoring 80.9% on SWE-bench Verified to top GPT-5.1 (77.9%) and Gemini 3 Pro (76.2%). It adds a new 'Effort' parameter for compute control, Tool Search to cut agent token overhead, and Programmatic Tool Calling where the model writes and executes code loops. Pricing dropped to $5/M input and $25/M output, roughly one-third the old Opus price.

80.9% SWE-bench Verified$5/M Input token price$25/M Output token price

Claude Opus 4.5 Announcement ↗Claude Opus 4.5 Tool Use Blog ↗Claude Opus 4.5 on X ↗

🎙️ Hear our coverage (+1 follow-up) →

#coding #agents #reasoning

Black Forest Labs Nov 27, 2025

New ModelsOpen weights

FLUX.2

Black Forest Labs releases FLUX.2, a 32B multi-reference image model

Black Forest Labs released FLUX.2, a 32B-parameter image model with open weights (FLUX.2-dev) that supports multi-reference image editing. It lets users combine multiple reference images and prompt edits with variables, a step up in controllable image editing.

32B Parameters

FLUX.2 on HuggingFace ↗FLUX.2 Blog ↗FLUX.2 Announcement on X ↗

🎙️ Hear our coverage →

#image-gen #open-source

DeepSeek Nov 27, 2025

New ModelsOpen weights

DeepSeek Math V2

DeepSeek Math V2: 685B open-weights model with IMO gold-level math

DeepSeek surfaced DeepSeek Math V2, a 685B-parameter Apache-2.0 model that reaches IMO gold-level math reasoning. It is the first open-weights math champion at this level, dropped quietly on HuggingFace during the week.

685B Parameters

DeepSeek Math V2 on HuggingFace ↗

🎙️ Hear our coverage →

#open-source #reasoning

Microsoft Nov 27, 2025

New ModelsOpen weights

Fara-7B

Microsoft ships Fara-7B, a 7B on-device computer use agent

Microsoft Research released Fara-7B, a best-in-class 7B-parameter vision-language model for computer use that runs on-device. It scores 73.5% on WebVoyager, beating OpenAI's computer-use preview while being small enough to run locally.

73.5% WebVoyager

Fara-7B on HuggingFace ↗Fara-7B Blog ↗Fara-7B Announcement on X ↗Fara on GitHub ↗

🎙️ Hear our coverage →

#open-source #agents #on-device

Prime Intellect Nov 27, 2025

New ModelsOpen weights

INTELLECT-3

Prime Intellect releases INTELLECT-3, a 106B open MoE model

Prime Intellect released INTELLECT-3, a 106B-parameter mixture-of-experts model with 12B active parameters that scores 90% on AIME 2024/2025. The lab fully open-sourced the training stack alongside the weights, showing a small lab can train frontier-scale models.

106B Total parameters (12B active)90% AIME 2024/2025

INTELLECT-3 on HuggingFace ↗INTELLECT-3 Blog ↗INTELLECT-3 Announcement on X ↗Try INTELLECT-3 ↗

🎙️ Hear our coverage →

#open-source #reasoning #architecture

Tencent (Hunyuan) Nov 27, 2025

New ModelsOpen weights

HunyuanOCR

Tencent's 1B HunyuanOCR beats 72B models on OCRBench

Tencent released HunyuanOCR, a 1B-parameter OCR model that scores 860 on OCRBench, beating models as large as Qwen3-VL-72B. It is a striking example of task-specialized small models outperforming generalist giants.

1B Parameters860 OCRBench score

HunyuanOCR on HuggingFace ↗HunyuanOCR on GitHub ↗HunyuanOCR Announcement on X ↗Hunyuan Vision Blog ↗

🎙️ Hear our coverage →

#vision #open-source #on-device

Tencent (Hunyuan) Nov 27, 2025

New ModelsOpen weights

HunyuanVideo 1.5

Tencent releases HunyuanVideo 1.5, a lightweight open video model

Tencent released HunyuanVideo 1.5, a lightweight DiT-based open-source video generation model. It brings capable video generation to a smaller footprint, continuing the trend of open video models closing the gap with closed offerings.

HunyuanVideo on HuggingFace ↗HunyuanVideo on GitHub ↗HunyuanVideo 1.5 Announcement on X ↗

🎙️ Hear our coverage →

#video-gen #open-source #architecture

Allen Institute for AI (Ai2) Nov 20, 2025

New ModelsOpen weights

OLMo 3

OLMo 3: Allen AI's fully open 32B model with complete recipe

Allen AI released OLMo 3, a fully open 32B dense model where the dataset, training recipe, and hyperparameters are all public — not just the weights. LDJ contrasted it with open-weights-only releases from Qwen and DeepSeek, which have never published a fully open recipe.

32B Dense parameters, fully open dataset and recipe

🎙️ Hear our coverage →

Google DeepMind Nov 20, 2025

New Models

Gemini 3 Pro

Gemini 3 Pro launches with record ARC-AGI-2 scores

Google's new frontier multimodal model with a 1M-token context window and huge reasoning gains, scoring 31.11% on ARC-AGI-2 (45.14% with Deep Think mode) — roughly double the previous SOTA — plus 81% on MMLU-Pro and major coding improvements. Amp switched to it as their default model on launch day, the first time they have ever switched defaults. Also rolling out across Gmail, Calendar, and AI Mode in Google Search.

45.14% ARC-AGI-2 (Deep Think)31.11% ARC-AGI-2 (standard)1M Token context window

🎙️ Hear our coverage (+1 follow-up) →

#reasoning #multimodal #frontier-models

Google DeepMind Nov 20, 2025

New Models

Nano Banana Pro

Nano Banana Pro generates 4K images with perfect text

Google's upgraded image model dropped as breaking news mid-show, adding visible thinking traces, 4K resolution output, and SynthID watermarking with C2PA metadata. Alex demoed it live by one-shotting an 8MB AI-news infographic with flawless text and pixel-accurate logos across the entire image. It also powers generative UIs in Gemini, building interactive dashboards with real data on the fly.

4K First image model with flawless 4K output and perfect text

AI Studio (Nano Banana Pro) ↗

🎙️ Hear our coverage →

Meta AI Nov 20, 2025

New ModelsOpen weights

SAM 3

Meta SAM 3: open-vocabulary segmentation and tracking in video

Meta's Segment Anything Model 3 adds open-vocabulary segmentation with text and exemplar prompts, letting you click or type to segment and track any object across images and video. The panel demoed it live on golden retriever videos, and it ships openly as part of Meta's open-source push.

🎙️ Hear our coverage →

#vision #open-source

Meta AI Nov 20, 2025

New ModelsOpen weights

SAM 3D

SAM 3D turns single photos into 3D objects and human bodies

Released alongside SAM 3, SAM 3D reconstructs 3D objects and full human bodies from a single image with surprisingly high quality. It extends the Segment Anything family from 2D segmentation into single-image 3D reconstruction.

🎙️ Hear our coverage →

#vision #world-models #open-source

OpenAI Nov 20, 2025

New Models

GPT-5.1-Codex-Max

GPT-5.1-Codex-Max runs 24-hour coding tasks with native compaction

OpenAI's newest frontier agentic coding model is trained with native compaction, letting it intelligently summarize prior context and work on a single task for 24+ hours (an internal run reportedly lasted a full week). It uses 30% fewer thinking tokens at median than its predecessors and sets a new SOTA of 58% on TerminalBench 2, also leading on SWE-Bench and SWE-Lancer. Windows PowerShell support is significantly improved, alongside an experimental Windows sandbox and a new extra-high reasoning level.

58% TerminalBench 2 (new SOTA)24h+ Single-task agent run time via native compaction30% Fewer thinking tokens at median

🎙️ Hear our coverage →

#coding #agents

Sunday Robotics Nov 20, 2025

New Models

ACT-1 & Memo

Sunday Robotics unveils ACT-1 home robot foundation model and Memo

Sunday Robotics introduced ACT-1, a home robot foundation model, alongside its Memo robot. Instead of $20K teleoperation rigs, training data comes from a $200 skill glove, and the model handles long-horizon household tasks with solid zero-shot generalization.

$200 Skill glove used for data collection vs $20K teleop rigs

🎙️ Hear our coverage →

#robotics #frontier-models

xAI Nov 20, 2025

New Models

Grok 4.1

Grok 4.1 briefly tops LM Arena with major post-training upgrade

xAI's Grok 4.1 shipped in November alongside GPT-5.1 and Claude Opus 4.5 in the year's most concentrated stretch of frontier releases. Yam highlighted the week-and-a-half window as emblematic of 2025's relentless acceleration.

1483 LM Arena Elo (briefly #1)

🎙️ Hear our coverage (+1 follow-up) →

#frontier-models #consumer-ai #reasoning

Alibaba (Qwen) Nov 13, 2025

New ModelsOpen weights

Qwen Image Edit Multi-Angle LoRA

Qwen Image Edit gains Multi-Angle LoRA for camera control

A Multi-Angle LoRA for Qwen Image Edit landed, enabling camera-control style edits that re-render a scene from new angles. Available as a Hugging Face space and on fal, it shows the fast-moving open ecosystem building on Qwen's image editing models.

Linoy Tsaban demo on X ↗Qwen-Image-Edit-Angles space on Hugging Face ↗fal on X ↗

🎙️ Hear our coverage →

#image-gen #architecture

Baidu Nov 13, 2025

New ModelsOpen weights

ERNIE-4.5-VL-28B-A3B-Thinking

Baidu open-sources ERNIE-4.5-VL-28B-A3B-Thinking visual reasoning model

Baidu released ERNIE-4.5-VL-28B-A3B-Thinking, an Apache 2.0 open-weights visual reasoning MoE with only 3B active parameters that claims to rival much larger models like GPT-5 High on vision tasks. It features image zooming, spatial grounding, and reasoning, with strong small-model performance attributed to GSPO training from the Qwen team.

3B Active Parameters

Baidu announcement on X ↗Hugging Face model page ↗GitHub repo ↗Ernie blog post ↗

🎙️ Hear our coverage →

#open-source #vision #reasoning

ElevenLabs Nov 13, 2025

New Models

Scribe v2 Realtime

ElevenLabs launches Scribe v2 Realtime speech-to-text with 150ms latency

ElevenLabs launched Scribe v2 Realtime, a streaming speech-to-text model with roughly 150ms latency and support for over 90 languages, demoed live by Paul Asjes. It auto-switches languages mid-stream and handles code, initialisms, and technical terms with context-aware transcription, outpacing Whisper on speed and accuracy.

150ms Latency90+ Languages (Scribe)

ElevenLabs announcement on X ↗ElevenLabs Agents ↗ElevenLabs docs ↗ElevenLabs Scribe V2 Real Time ↗

🎙️ Hear our coverage (+1 follow-up) →

H Company Nov 13, 2025

New ModelsOpen weights

Holo2

H Company open-sources Holo2 multimodal computer-use agent family

Dropped live during the show: H Company open-sourced Holo2, a next-generation multimodal agent family fine-tuned on Qwen3-VL for grounding, navigation, and reasoning across web, desktop, and mobile. It posts SOTA results on computer-use and web-navigation benchmarks like OSWorld-G and ships in 4B, 8B, and 30B variants under Apache 2.0.

🎙️ Hear our coverage →

#agents #open-source

Meta AI Nov 13, 2025

New ModelsOpen weights

Omnilingual ASR

Meta releases Omnilingual ASR covering 1,600+ languages

Meta released Omnilingual ASR, an Apache 2.0 speech recognition family supporting over 1,600 languages, including 500+ never before served by any ASR system, with character error rate under 10% for 78 languages. The release includes an open corpus of 500k+ rows of transcribed audio, and the 1B model was praised as a near drop-in state-of-the-art replacement on Hugging Face.

1600+ Languages Supported

AI at Meta announcement on X ↗Meta blog post ↗Research paper ↗Omnilingual ASR corpus on Hugging Face ↗

🎙️ Hear our coverage →

#voice-ai #open-source

NVIDIA Nov 13, 2025

New ModelsOpen weights

ChronoEdit-14B Upscaler LoRA

NVIDIA releases ChronoEdit-14B Upscaler LoRA

NVIDIA released an Upscaler LoRA for its ChronoEdit-14B image editing model, available on Hugging Face with Diffusers pipeline support. It adds high-quality upscaling to the ChronoEdit physics-aware editing stack.

Announcement on X ↗Hugging Face model page ↗Diffusers ChronoEdit docs ↗

🎙️ Hear our coverage →

#image-gen #architecture

OpenAI Nov 13, 2025

New Models

GPT-5.1

OpenAI launches GPT-5.1 with a warmer, more personable voice

OpenAI shipped GPT-5.1, an update to its flagship model focused on a warmer tone and personality upgrades. The panel discussed how the friendlier default voice changes day-to-day ChatGPT use and what it signals for the frontier model race.

Fidji Simo announcement on X ↗Sam Altman on X ↗

🎙️ Hear our coverage (+1 follow-up) →

#frontier-models #reasoning

W WeiboAI Nov 13, 2025

New ModelsOpen weights

VibeThinker-1.5B

WeiboAI releases VibeThinker-1.5B open reasoning model

Weibo's AI team open-sourced VibeThinker-1.5B, a tiny reasoning model that reportedly outperforms much larger models like DeepSeek R1 on select reasoning benchmarks. Part of a week where small open-weights models from Chinese labs kept punching above their weight.

WeiboLLM announcement on X ↗Hugging Face model page ↗Arxiv paper ↗VentureBeat coverage ↗

🎙️ Hear our coverage →

#open-source #reasoning #on-device

Allen Institute for AI (Ai2) Nov 6, 2025

New ModelsOpen weights

OlmoEarth

Ai2 launches OlmoEarth foundation models and open Earth-intelligence platform

Ai2 launched OlmoEarth, a family of foundation models plus an open, end-to-end platform for fast, high-resolution Earth intelligence. It applies the lab's open-model approach to geospatial and remote-sensing data, making Earth observation workloads accessible without proprietary stacks.

🎙️ Hear our coverage →

#open-source #vision #frontier-models

Inworld AI Nov 6, 2025

New Models

Inworld TTS

Inworld TTS takes the #1 spot on the Artificial Analysis speech benchmark

Inworld released a new version of its TTS model that claimed the #1 position on the Artificial Analysis text-to-speech benchmark. It featured in the episode's voice segment as evidence that commercial TTS quality keeps climbing fast.

🎙️ Hear our coverage →

#voice-ai #benchmarks

M Maya Research Nov 6, 2025

New ModelsOpen weights

Maya-1

Maya-1 open-source voice generation model released

Maya-1 is a new open-source voice generation model that was demoed on the show as part of the week's voice AI wave. The panel highlighted how quickly open voice model quality is improving, with expressive output that holds up against commercial systems.

🎙️ Hear our coverage →

#voice-ai #open-source

Meituan (LongCat) Nov 6, 2025

New ModelsOpen weights

LongCat Flash Omni

Meituan releases LongCat Flash Omni, a 560B (27B active) omni model

Meituan's LongCat team released LongCat Flash Omni, a 560B-parameter mixture-of-experts model with roughly 27B active parameters that accepts text, audio, and video input. It extends the open LongCat Flash line into omni-modal territory from a lab better known for food delivery than frontier models.

X ↗HF ↗Announcement ↗

🎙️ Hear our coverage →

#open-source #multimodal

Moonshot AI Nov 6, 2025

New ModelsOpen weights

Kimi K2 Thinking

Moonshot AI releases Kimi K2 Thinking, an open 1T-param reasoning MoE

Moonshot AI released Kimi K2 Thinking, an open-source 1-trillion-parameter mixture-of-experts reasoning agent with 256K context and large-scale tool-calling capacity. The panel treated it as the open-source centerpiece of the week, focusing on its reasoning quality and coding utility rather than just benchmark screenshots, and as a sign open models keep closing the usability gap with frontier closed models.

X ↗HF ↗Tech Blog ↗Arxiv ↗

🎙️ Hear our coverage →

#open-source #reasoning #agents

🚀 Products & Apps 4

LTX Studio (Lightricks) Nov 27, 2025

Products & Apps

LTX Retake

LTX Studio's Retake brings Photoshop-style object editing to video

LTX Studio launched Retake, an AI video editing tool that enables inpainting-style editing of specific objects within video frames. Wolfram called it 'the image editing moment for video' — Photoshop for video, available to try on Replicate.

LTX Retake on Replicate ↗LTX Retake Announcement on X ↗

🎙️ Hear our coverage →

Weights & Biases Nov 27, 2025

Products & Apps

Serverless LoRA Inference

W&B launches Serverless LoRA Inference on CoreWeave

Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.

W&B Serverless LoRA Report ↗W&B LoRA Notebook ↗W&B Announcement on X ↗

🎙️ Hear our coverage →

#infrastructure #training #coding

S Sandbar Nov 6, 2025

Products & Apps

Stream / Stream Ring

Sandbar launches Stream voice assistant and Stream Ring wearable

Sandbar launched Stream, a voice-first personal assistant, alongside Stream Ring, a wearable described as a 'mouse for voice' that is now available for preorder. The pairing pushes always-available voice interaction into dedicated hardware rather than the phone.

🎙️ Hear our coverage →

#voice-ai #infrastructure #consumer-ai

XPeng Nov 6, 2025

Products & Apps

Iron

XPeng unveils 'Iron' humanoid robot with soft skin and 2026 production plan

XPeng unveiled Iron, a humanoid robot it claims has the most human-like design yet, featuring soft skin, bionic muscles, and a VLT (vision-language-task) brain. The company says it plans to put Iron into production in 2026, putting a Chinese EV maker squarely in the humanoid race.

🎙️ Hear our coverage →

✨ Major Features & Updates 6

OpenAI Nov 27, 2025

Major Features & Updates

ChatGPT Voice Mode

OpenAI integrates ChatGPT Voice Mode directly into chats

OpenAI integrated ChatGPT's Voice Mode directly into the chat interface instead of a separate full-screen experience. Users can now talk to ChatGPT while seeing transcripts and visual responses inline in the conversation.

OpenAI Voice Mode Announcement on X ↗

🎙️ Hear our coverage →

#voice-ai #consumer-ai

OpenAI Nov 20, 2025

Major Features & Updates

GPT-5.1 Pro

GPT-5.1 Pro: research-grade deep-thinking mode in ChatGPT

OpenAI also shipped GPT-5.1 Pro, a new research-grade ChatGPT mode that will happily think for minutes on a single query. It targets hard research-style questions where extended deliberation pays off, rounding out OpenAI's big week alongside Codex-Max.

🎙️ Hear our coverage →

Google DeepMind Nov 13, 2025

Major Features & Updates

Gemini Live

Gemini Live gets a conversational voice upgrade

Google rolled out an upgrade to Gemini Live's voice capabilities, making conversations more natural. Covered in the big-companies roundup alongside GPT-5.1 and Grok 4 Fast as the voice interface race heats up.

Gemini Live upgrade on X ↗

🎙️ Hear our coverage →

xAI Nov 13, 2025

Major Features & Updates

Grok 4 Fast

Grok 4 Fast expands to a 2 million token context window

xAI's Grok 4 Fast now supports a 2 million token context window, one of the largest of any frontier model. The crew called the jump 'crazy' and discussed what such long context unlocks for agentic and document-heavy workloads.

2M Context Window

Grok 4 Fast 2M context on X ↗Grok update thread on X ↗

🎙️ Hear our coverage →

#architecture #frontier-models

Cursor Nov 6, 2025

Major Features & Updates

Cursor in-IDE browser

Cursor adds a built-in browser inside the IDE

Cursor added an in-IDE browser, letting developers preview and interact with their running app without leaving the editor. The panel called out how performant the implementation is, tightening the loop between agentic code edits and visual verification.

🎙️ Hear our coverage →

Windsurf (Cognition) Nov 6, 2025

Major Features & Updates

Codemaps

Windsurf ships Codemaps, AI-annotated navigable maps of your codebase

Cognition's Windsurf launched Codemaps, AI-annotated and navigable maps of a codebase powered by SWE-1.5 for fast mode and Claude Sonnet 4.5 for smart mode. It aims to help developers and agents build a structural understanding of large repos instead of navigating file by file.

X ↗Announcement ↗

🎙️ Hear our coverage →

🔌 APIs & Platforms 1

xAI Nov 20, 2025

APIs & Platforms

Grok 4.1 Fast + Agent Tools API

Grok 4.1 Fast: 2M context and Agent Tools API at 10x lower cost

Launched as breaking news during the show, Grok 4.1 Fast pairs a 2 million token context window with a new Agent Tools API offering native X search, Reddit search, web browsing, and code execution. Benchmarks are striking: 93-100% on tau2-Bench Telecom and 72% on Berkeley Function Calling v4 (top of the leaderboard) at $0.20/$0.50 per million tokens — roughly 10x cheaper than competitors, and free for the first two weeks on the xAI API and OpenRouter.

93–100% τ²-Bench Telecom72% Berkeley Function Calling v42M Token context window

🎙️ Hear our coverage →

🛠️ Dev Tools 3

Google DeepMind Nov 20, 2025

Dev Tools

Antigravity

Antigravity: Google's free agent-first IDE powered by Gemini 3 Pro

A free VS Code fork reimagined for agent-first coding, with an inbox-style Agent Manager for running multiple coding agents in parallel across a codebase. Browser integration lets agents control Chrome, take screenshots and videos of the running app, and self-debug. The free tier is powered by Gemini 3 Pro, with GPT-OSS 120B as the open-source alternative and Nano Banana for images.

Antigravity IDE ↗

🎙️ Hear our coverage →

#coding #agents

Marimo Nov 20, 2025

Dev ToolsOpen weights

Marimo VS Code / Cursor extension

Marimo ships reactive Python notebooks extension for VS Code and Cursor

Marimo released a new VS Code and Cursor extension bringing its reactive Python notebooks directly into the editor, with UV integration for dependency management. It was highlighted in the open-source roundup as a notable dev-tool release of the week.

Marimo VS Code / Cursor extension ↗

🎙️ Hear our coverage →

Weights & Biases Nov 13, 2025

Dev ToolsOpen weights

W&B LEET

W&B ships LEET, an open-source terminal UI for monitoring ML runs

Weights & Biases released LEET (Lightweight Experiment Exploration Tool), an open-source terminal-native dashboard for tracking ML runs, demoed live by Dima Duev of the SDK team. It works fully offline for air-gapped HPC clusters and brings real-time metrics, system stats, and zoomable interactive charts to the terminal.

W&B announcement on X ↗W&B LEET blog post ↗W&B LEET (wandb beta leet) ↗

🎙️ Hear our coverage →

#coding #infrastructure

📦 Datasets 1

I Inference.net Nov 13, 2025

DatasetsOpen weights

Project AELLA (OSSAS)

Project AELLA publishes 100K LLM-generated research paper summaries

Project AELLA (also called OSSAS) released 100,000 LLM-generated structured summaries of scientific papers, published openly on Hugging Face. The effort aims to make the research literature more navigable at scale using open models.

Sam Hogan announcement on X ↗Inference.net on Hugging Face ↗

🎙️ Hear our coverage →

#training #research

📊 Benchmarks & Evals 2

L Laude Institute / Stanford Nov 13, 2025

Benchmarks & EvalsOpen weights

Terminal-Bench 2.0

Terminal-Bench 2.0 and Harbor launch as new bar for coding agents

Terminal-Bench 2.0 launched alongside the Harbor framework, with 89 hard, realistic terminal-based tasks built with around 1000 Discord contributors. The Warp agent tops the leaderboard at 50% with Codex CLI close behind, and the panel argued an unsaturated 50% ceiling makes it far more meaningful than near-saturated benchmarks like MMLU.

50% Terminal Bench v2 Top Score

Announcement on X ↗Harbor framework ↗Running Terminal-Bench docs ↗Terminal-Bench leaderboard ↗

🎙️ Hear our coverage →

#benchmarks #agents #coding

LMArena (LMSYS) Nov 13, 2025

Benchmarks & Evals

Code Arena

LMArena launches Code Arena for live agentic coding evaluations

LMArena launched Code Arena, a live evaluation platform where models build real applications agentically and humans vote on the results. It extends the arena-style crowdsourced ranking approach to agentic coding workflows.

Arena announcement on X ↗Code Arena blog post ↗Code Arena ↗

🎙️ Hear our coverage →

#benchmarks #coding #agents

🌀 Also Released 4

M Model Context Protocol (Anthropic + OpenAI) Nov 27, 2025

Also ReleasedOpen weights

MCP Apps

MCP-UI becomes MCP Apps, an official standard from Anthropic + OpenAI

MCP-UI, created by Ido Salomon and Liad Yosef, was standardized as 'MCP Apps' — an official MCP extension jointly adopted by Anthropic and OpenAI that unifies MCP-UI with what OpenAI called Operator Plugins. Agents can now render full interactive HTML UIs directly inside chat, avoiding iOS-vs-Android style fragmentation with one open standard.

MCP Apps Blog Post ↗MCP-UI / MCP Apps Website ↗MCP Apps Announcement on X ↗

🎙️ Hear our coverage →

Anthropic Nov 6, 2025

Also Released

Code execution with MCP

Anthropic publishes code-execution-with-MCP pattern for token-efficient agents

Anthropic published an engineering post showing how running MCP-connected tools as code, instead of direct tool calls, slashes token use and scales agents to many more tools. The approach echoes Cloudflare's Code Mode and framed the episode's interview with Kenton Varda about agents writing code against tool APIs.

🎙️ Hear our coverage →

#agents #coding

Amazon Web Services Nov 6, 2025

Also Released

AWS-OpenAI infrastructure partnership

AWS announces multi-year strategic infrastructure partnership with OpenAI

AWS announced a multi-year strategic infrastructure partnership with OpenAI to power ChatGPT inference, training, and agentic AI workloads. It is another sign of OpenAI spreading its compute needs across every major cloud provider, and a notable win for AWS in the frontier-AI infrastructure race.

🎙️ Hear our coverage →

#infrastructure #agents

Hugging Face Nov 6, 2025

Also ReleasedOpen weights

Smol Training Playbook

Hugging Face publishes the Smol Training Playbook for LLM pretraining

Hugging Face published the Smol Training Playbook, a 200+ page end-to-end guide to reliably pretraining and operating LLMs. It distills the team's practical experience from the SmolLM line into an open resource for anyone training their own models.

X ↗Announcement ↗

🎙️ Hear our coverage →

#open-source #training

← October 2025 All months December 2025 →