Infrastructure & Inference

Compute, GPUs, hardware, serving, quantization, and efficiency for running models at scale. — 70 releases covered on the show.

July 2026

P PyTorch Jul 8, 2026

Dev ToolsOpen weights

PyTorch 2.13

PyTorch 2.13 lands FlexAttention on Apple Silicon and big memory wins

3,328 commits from 526 contributors: FlexAttention on Apple Silicon at roughly 12x over SDPA for sparse patterns, a deterministic CUDA backward path, nn.LinearCrossEntropyLoss with up to 4x peak-memory reduction, torchcomms for large-cluster training, and expanded ROCm/Arm/XPU support.

~12x FlexAttention on Apple Silicon vs SDPA3,328 Commits from 526 contributors

X announcement ↗

🎙️ Hear our coverage →

#open-source #training #infrastructure

E Exo Labs Jul 2, 2026

Products & Apps

local.ai

Exo Labs launches local.ai to track the local-AI frontier

Announced live on ThursdAI at AI Engineer: local.ai tracks the best model for your hardware, the performance trade versus the cloud, and whether running local beats API-token pricing. Early access is live with signup codes, and the Exo CLI — 'vLLM for consumer devices, with the configs figured out for you' — ships in the coming weeks.

71% Terminal Bench 2.1, REAP-pruned GLM 5.2550B Nemotron-3 Ultra running on 4 NVIDIA Sparks

🎙️ Hear our coverage →

#on-device #open-source #infrastructure

Together AI Jul 1, 2026

Funding

Series C

Together AI raises $800M Series C at an $8.3B valuation

Aramco Ventures led the round with NVIDIA, Vista Equity and General Catalyst participating. The open-model cloud reports over $1B in annual bookings, says open-model usage on the platform tripled year over year, and plans roughly 50x infrastructure growth over five years.

$800M Series C$8.3B Valuation>$1B Annual bookings

🎙️ Hear our coverage →

#industry #infrastructure

June 2026

Weights & Biases Jun 29, 2026

Products & Apps

Aria

W&B Aria auto-research agent goes GA

Aria went generally available on Monday — an auto-research agent living in the W&B UI ('Just Ask Aria') that reads your traces and debugs your loss curves. In Zubin Aysola's AI Engineer talk, Aria read its own production traces and updated its own prompts.

Weights & Biases ↗

🎙️ Hear our coverage →

#agents #research #infrastructure

OpenAI Jun 25, 2026

Products & Apps

Jalapeno

OpenAI unveils Jalapeno custom inference chip with Broadcom

OpenAI unveiled Jalapeno, its first custom inference ASIC built with Broadcom, positioning it as part of a full-stack strategy to make ChatGPT, Codex, API, and agent workloads cheaper and faster at scale.

9 months claimed design to tape-out50% inference cost reduction claim1.3GW planned deployment scale

OpenAI Jalapeno announcement ↗OpenAI Jalapeno tweet ↗

🎙️ Hear our coverage →

#infrastructure

Weights & Biases / CoreWeave Jun 18, 2026

APIs & Platforms

Kimi K2.7 Code on CoreWeave Inference

Kimi K2.7 Code goes live on W&B/CoreWeave Inference

Kimi K2.7 Code became available on W&B/CoreWeave Inference, with the episode notes calling out Blackwell NVFP4 serving, speculative decoding, and 289 tokens per second near the top of Artificial Analysis speed and price-performance charts.

289 tok/s reported throughput

CoreWeave announcement ↗Try Kimi K2.7 Code on W&B/CoreWeave Inference ↗

🎙️ Hear our coverage →

#api #infrastructure #coding

Weights & Biases Jun 18, 2026

Dev ToolsOpen weights

HiveMind

Weights & Biases launches HiveMind for coding-agent observability

Weights & Biases launched HiveMind, a dashboard for tracking AI coding-agent sessions, spend, transcripts, ROI, and reusable organizational learning. Chris Van Pelt and Adrian Swanberg joined the show to explain why teams need observability for their growing fleet of coding agents.

W&B announcement on X ↗HiveMind ↗HiveMind on GitHub ↗

🎙️ Hear our coverage →

#coding #agents #infrastructure

NVIDIA Jun 4, 2026

Products & Apps

RTX Spark

NVIDIA announces RTX Spark Arm + Blackwell platform for local AI PCs

At Computex, NVIDIA unveiled RTX Spark, an Arm CPU plus Blackwell GPU PC platform with 128GB unified memory targeting local AI agents and 120B-class local inference. A wave of thin laptops with RTX 5070-class GPUs and roughly one petaflop of local AI compute raises the question of what agents should run locally versus in the cloud.

Coverage (Tom's Hardware) ↗

🎙️ Hear our coverage →

#infrastructure #on-device #agents

May 2026

P PrismML May 28, 2026

New ModelsOpen weights

Bonsai Image 4B

PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB

PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.

PrismML Bonsai Image 4B — blog ↗PrismML Bonsai on Hugging Face ↗Bonsai Image demo ↗Bonsai Studio iOS app ↗

🎙️ Hear our coverage →

#image-gen #on-device #infrastructure

Weights & Biases May 28, 2026

Dev Tools

W&B MCP Server

Weights & Biases launches MCP server with 20 tools for agents

W&B officially launched its MCP server with 20 schema-first tools so coding agents can read experiments, monitor training, and run autonomous research loops. Agents can query metadata before pulling full 300-metric runs, keeping their context windows from blowing up.

W&B MCP Server ↗W&B MCP Server — blog ↗W&B announcement ↗

🎙️ Hear our coverage →

#agents #coding #infrastructure

Anthropic May 21, 2026

Also Released

Colossus compute deal

SpaceX IPO filing reveals Anthropic pays $1.25B/month for Colossus compute

The SpaceX IPO filing revealed Anthropic is paying $1.25 billion per month for AI compute at the Memphis Colossus facility. The crew called it a bombastic deal that lets Anthropic serve far more inference at scale and feel less compute-constrained.

$1.25B monthly AI compute spend

Axios ↗Sawyer Merritt ↗

🎙️ Hear our coverage →

#infrastructure

CoreWeave May 14, 2026

Products & Apps

CoreWeave Sandboxes

CoreWeave Sandboxes launch in preview via the W&B SDK

CoreWeave Sandboxes is now an official Harbor provider, letting teams run agentic workloads like Terminal-Bench safely at scale on CoreWeave infrastructure. It plugs CoreWeave's isolated execution environments directly into the Harbor eval/agent ecosystem.

Docs ↗CoreWeave blog ↗CoreWeave Sandboxes ↗

🎙️ Hear our coverage (+1 follow-up) →

#agents #infrastructure #benchmarks

April 2026

Amazon Web Services Apr 30, 2026

APIs & Platforms

GPT-5.5 and Codex on Bedrock

AWS brings GPT-5.5 and Codex to Bedrock as Azure exclusivity ends

AWS announced GPT-5.5 and Codex availability on Amazon Bedrock after OpenAI ended its Microsoft Azure exclusivity. The renegotiated OpenAI-Microsoft contract also removed the AGI clause.

Sam Altman tweet ↗

🎙️ Hear our coverage →

#infrastructure #api #frontier-models

Stripe Apr 30, 2026

Dev Tools

Projects.dev

Stripe opens Projects.dev: 32 infra providers provisionable by agents

Stripe removed the waitlist on Projects.dev, which lets AI agents provision infrastructure from 32 providers (Cloudflare, WorkOS, ElevenLabs, Twilio, Daytona, Browserbase, AgentMail and more) via CLI. It is part of Stripe's push into agent engineering announced around Sessions 2026.

Projects.dev ↗

🎙️ Hear our coverage →

#agents #coding #infrastructure

CoreWeave Apr 16, 2026

Also Released

Anthropic, Meta & Jane Street deals

CoreWeave signs Anthropic, Meta ($21B), and Jane Street ($6B + $1B)

CoreWeave announced a multibillion-dollar deal with Anthropic, a $21B expansion with Meta (taking the relationship past $35B total), and a Jane Street deal worth $6B in cloud plus $1B in equity. CoreWeave now serves 9 of the top 10 AI labs, cementing its position as the neocloud backbone of frontier AI.

🎙️ Hear our coverage →

#infrastructure #industry

Weights & Biases Apr 16, 2026

Major Features & Updates

Gemma 4 on W&B Inference

Gemma 4 goes live on W&B Inference with LoRA inference support

Weights & Biases put Gemma 4 live on W&B Inference, running on CoreWeave infrastructure with LoRA inference support. Replying to the W&B announcement post on X with the code 'Gem Drop' gets $20 in free inference credits.

W&B Inference ↗W&B announcement post (X) ↗

🎙️ Hear our coverage →

#infrastructure #open-source

Anthropic Apr 9, 2026

Products & Apps

Managed Agents

Anthropic ships Managed Agents, a fully hosted agent runtime

Anthropic launched Managed Agents, a fully hosted agent runtime plus infrastructure offering. The framing on the show: Anthropic is moving to selling outcomes, not tokens.

🎙️ Hear our coverage →

#agents #infrastructure

Weights & Biases Apr 9, 2026

Major Features & Updates

W&B Automations

W&B Automations launch: event triggers from training runs

Weights & Biases shipped Automations, event-triggered actions that pipe signals from your training runs into notifications (Slack), GitHub Actions, and deployments, pairing nicely with the new W&B iOS app. In the same Buzz segment: GLM-5.1 and Gemma 4 both went live on W&B Inference.

W&B Inference ↗wandb.com ↗

🎙️ Hear our coverage →

#infrastructure #coding

OpenAI Apr 2, 2026

Funding

$122B funding round

OpenAI closes $122B funding round at $852B valuation

OpenAI closed a reported $122 billion funding round, described as the largest in history, at an $852B valuation with an IPO said to be incoming. The panel discussed what that scale of capital implies for AI infrastructure spending, product velocity, and competitive pressure across the market.

$122B OpenAI funding round

OpenAI announcement (X) ↗Deal breakdown (X) ↗

🎙️ Hear our coverage →

#industry #infrastructure

P PrismML Apr 2, 2026

New ModelsOpen weights

Bonsai

PrismML releases Bonsai 1-bit models, an 8B model in 1.15 GB

PrismML released Bonsai, a family of 1-bit quantized open models fitting an 8B model into 1.15 GB and claiming 10x intelligence density, built on decades of compression research. The panel discussed one-bit quantization as a cost/performance lever for cheap local inference.

Announcement (X) ↗Hugging Face ↗PrismML site ↗

🎙️ Hear our coverage →

#open-source #infrastructure #on-device

March 2026

Google Research Mar 26, 2026

Papers & Research

TurboQuant

Google TurboQuant claims 6x KV-cache compression and 8x faster inference

Google Research published TurboQuant, a KV-cache quantization technique claiming 6x compression and 8x inference speedup with near-zero accuracy loss. The panel framed it as a potential unlock for LLM inference economics, while calling stock-market panic over the result premature without broader production validation.

6× TurboQuant KV-cache compression8× TurboQuant speedup claim

Google Research announcement (X) ↗Google Research blog: TurboQuant ↗TurboQuant paper (arXiv) ↗

🎙️ Hear our coverage →

#infrastructure

Modular Mar 26, 2026

Products & Apps

Modular 26.2

Modular 26.2 runs FLUX.2 in under a second, 99% cheaper than Nano Banana

Modular shipped its 26.2 release with state-of-the-art image generation, running FLUX.2 in under one second (sub-300ms claims) at 99% lower cost than Nano Banana, plus upgraded AI coding with Mojo. Alex noted the surprise of an inference platform releasing model-level optimization and hoped the approach spreads to all image generation.

Modular announcement (X) ↗Modular 26.2 blog post ↗Modular FLUX.2 speed demo (X) ↗

🎙️ Hear our coverage →

#image-gen #infrastructure #coding

NVIDIA Mar 19, 2026

Major Features & Updates

DLSS 5

NVIDIA DLSS 5 adds a generative AI filter for photo-realistic lighting

Announced at GTC, NVIDIA's DLSS 5 introduces a new generative AI filter bringing photo-realistic lighting to RTX 50-series GPUs. It applies generative models to real-time game rendering, extending DLSS beyond upscaling and frame generation.

Digital Foundry coverage ↗

🎙️ Hear our coverage →

#image-gen #world-models #infrastructure

NVIDIA Mar 19, 2026

Also Released

GR LPX (Rubin NVL72 + Groq 3)

NVIDIA GTC: GR LPX pairs Rubin NVL72 servers with the new Groq 3 chip

NVIDIA's GTC hardware reveal integrates the new Groq 3 chip (gen 2 was never publicly seen) into Rubin NVL72 servers via the GR LPX system. Claims include 3x tokens-per-watt efficiency at baseline, up to 30x at higher throughput, and 1000+ tokens/sec on a 2T-parameter frontier model with 400K context — performance the current Blackwell generation can't reach at any price.

🎙️ Hear our coverage →

#infrastructure

Weights & Biases Mar 19, 2026

Products & Apps

W&B iOS App

Weights & Biases launches native iOS app for monitoring training runs

W&B shipped its most-requested feature ever: a native iOS app for monitoring AI training runs with live metrics and push notifications for crash alerts. Practitioners can now keep an eye on long-running training jobs from their phone instead of staying glued to a dashboard.

W&B on X ↗App Store ↗W&B site ↗

🎙️ Hear our coverage →

#coding #infrastructure

Google DeepMind Mar 5, 2026

New Models

Gemini 3.1 Flash-Lite

Google launches Gemini 3.1 Flash-Lite with 1M context at 360 tok/s

Google launched Gemini 3.1 Flash-Lite, a fast and cheap model with 1M token context aimed at the instant/fast tier, running around 360 tokens per second. The panel flagged a material pricing jump versus the prior Flash-Lite generation but saw it as well suited for judge, guardrail, and orchestration workloads in agent systems.

360 tokens/sec Gemini 3.1 Flash-Lite speed

Logan Kilpatrick announcement ↗Gemini Flash-Lite page ↗

🎙️ Hear our coverage →

#frontier-models #architecture #infrastructure

February 2026

T Taalas Feb 26, 2026

Products & Apps

ChatJimmy (baked-weights chip demo)

Taalas demos 15,000+ tokens/sec with model weights baked into silicon

Taalas published a live demo (chatjimmy.ai) showing Llama 3 8B running at 15,691 tokens per second on a chip with weights baked directly into the hardware. The panel called it a 10x speed-class jump that points at chip-level innovation compressing inference costs and iteration cycles.

15,000 tok/s Taalas Demo Throughput

ChatJimmy demo ↗

🎙️ Hear our coverage →

#infrastructure

Weights & Biases Feb 26, 2026

Major Features & Updates

W&B Inference: MiniMax 2.5 & Kimi K2.5

W&B Inference adds MiniMax 2.5 and Kimi K2.5

Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.

MiniMax M2.5 on W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #api #open-source

Weights & Biases Feb 19, 2026

Major Features & Updates

Kimi K2.5 on W&B Inference

W&B adds Kimi K2.5 to its inference service

Weights & Biases launched Kimi K2.5 on its inference service, making Moonshot AI's model available to W&B users. In Wolfram's Terminal Bench deep dive for W&B, Kimi K2.5 achieved a 67.4% ceiling score across multiple runs, among the strongest open-model results he measured.

W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #open-source

OpenAI Feb 12, 2026

New Models

GPT 5.3 Codex Spark

OpenAI ships GPT 5.3 Codex Spark on Cerebras for real-time coding

OpenAI released GPT 5.3 Codex Spark, a smaller Codex variant built for real-time coding, served on Cerebras hardware — OpenAI's first model on Cerebras — with reported speeds of over 1000 tokens/sec. Available to ChatGPT Pro users in the Codex app, CLI, and IDE extension. It broke during the show as the second breaking-news drop of the episode.

100 tps Codex Spark speed

Sam Altman announcement on X ↗

🎙️ Hear our coverage →

#coding #infrastructure

Weights & Biases Feb 12, 2026

Major Features & Updates

W&B Inference (GLM-5 & Kimi K2.5)

W&B Inference adds day-zero GLM-5 and Kimi K2.5 support

Weights & Biases launched day-zero GLM-5 support on its CoreWeave-powered W&B Inference service, alongside Kimi K2.5, with MiniMax 2.5 coming soon. Alex announced $50 in free credits for listeners to test the new open-weights models.

W&B announcement on X ↗W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #open-source

January 2026

OpenAI Jan 15, 2026

Also Released

OpenAI x Cerebras Partnership

OpenAI inks $10B deal with Cerebras for 750MW of high-speed compute

OpenAI announced a $10 billion partnership with Cerebras for 750 megawatts of high-speed inference compute, with capacity starting in 2028. It extends OpenAI's pattern of locking in massive compute supply deals beyond its existing cloud partners.

$10B OpenAI × Cerebras

🎙️ Hear our coverage →

#infrastructure

NVIDIA Jan 8, 2026

Acquisitions

Groq acquisition

NVIDIA acquires Groq team and licenses its tech for ~$20B

NVIDIA entered an exclusive licensing deal with Groq and acquired most of its team for approximately $20B. Groq's inference-optimized chips, created by former Google TPU lead Jonathan Ross, complement NVIDIA's training dominance as inference demand grows exponentially across AI use cases.

🎙️ Hear our coverage →

#industry #infrastructure

NVIDIA Jan 8, 2026

Products & Apps

Vera Rubin

NVIDIA Vera Rubin platform: 5x Blackwell inference at CES 2026

Jensen Huang unveiled the Vera Rubin platform at CES 2026, NVIDIA's next-gen AI computer delivering 50 PFLOPS and 5x inference performance over Blackwell while adding only ~200W of power draw. It needs 75% fewer GPUs for 10 trillion parameter MoE training, packs 72 GPUs per rack with 20.7TB memory and 13 TB/s bandwidth, is 100% liquid cooled, and entered full production just four months after the B300.

5x Vera Rubin vs Blackwell75% Fewer GPUs needed

NVIDIA CES 2026 News ↗

🎙️ Hear our coverage →

#infrastructure

December 2025

NVIDIA Dec 25, 2025

Products & Apps

Project Digits

NVIDIA Project Digits: $3,000 desktop that runs 200B-param models

NVIDIA announced Project Digits in January, a $3,000 desktop supercomputer capable of running 200B parameter models locally. It brought serious local-inference hardware to individual developers and was one of January's standout hardware stories.

Jan 10 Episode ↗

🎙️ Hear our coverage →

#infrastructure #on-device

OpenAI (with SoftBank & Oracle) Dec 25, 2025

Funding

Project Stargate

Project Stargate: $500B AI infrastructure commitment announced

Announced in January, Project Stargate committed $500 billion to AI infrastructure in the US — described on the show as the Manhattan Project for AI. It set the tone for a year in which investment numbers stopped making sense.

$500B Project Stargate

Jan 24 Episode ↗

🎙️ Hear our coverage →

#infrastructure #industry

Zhipu AI (GLM) Dec 25, 2025

New ModelsOpen weights

GLM 4.5

GLM 4.5 runs on Cerebras fast enough to win hackathons

Zhipu's GLM 4.5 came out in July and was the first open model that ran on Cerebras hardware fast enough that hackathon competitors were winning with it. It set up GLM's quiet rise as a business workhorse later in the year.

🎙️ Hear our coverage →

#open-source #infrastructure

Google DeepMind Dec 18, 2025

New Models

Gemini 3 Flash

Gemini 3 Flash delivers frontier intelligence at $0.50/1M input tokens

Google launched Gemini 3 Flash, offering frontier-tier capability at flash-tier pricing of $0.50 per million input tokens. It scores 78% on SWE-bench Verified, beating larger models on some agentic tasks, and supports tool-calling at scale with up to 100 simultaneous function calls.

$0.50 per 1M Gemini 3 Flash input tokens78% SWE-bench Verified

Gemini 3 Flash announcement ↗Logan Kilpatrick announcement on X ↗

🎙️ Hear our coverage (+1 follow-up) →

#frontier-models #agents #coding

NVIDIA Dec 18, 2025

New ModelsOpen weights

Nemotron 3 Nano

NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes

NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.

30B (3B active) Nemotron 3 Nano parameters

NVIDIA Nemotron 3 Nano announcement ↗NVIDIA Nemotron 3 Nano (HF BF16) ↗NVIDIA Nemotron 3 Nano (HF FP8) ↗

🎙️ Hear our coverage →

#open-source #architecture #infrastructure

Pruna AI Dec 4, 2025

New Models

P-Image

Pruna P-Image promises sub-second image generation at $0.005

Pruna AI promoted P-Image, an image generation offering with sub-second generation times at roughly $0.005 per image. The release fit the week's diffusion theme of competing on speed and cost efficiency rather than just quality.

$0.005 Per image

Pruna P-Image ↗Pruna demo ↗Pruna announcement on X ↗

🎙️ Hear our coverage →

#image-gen #infrastructure

Weights & Biases Dec 4, 2025

Products & Apps

LLM Evaluation Jobs

W&B launches LLM Evaluation Jobs for OpenAI-compatible APIs

Weights & Biases launched LLM Evaluation Jobs, letting teams run evaluations against any OpenAI-compatible API during training cycles instead of only at the end. The show framed it as a practical workflow upgrade for getting earlier model quality signals without blindly burning compute.

W&B LLM Evaluation Jobs ↗W&B announcement on X ↗

🎙️ Hear our coverage →

#benchmarks #coding #infrastructure

November 2025

Weights & Biases Nov 27, 2025

Products & Apps

Serverless LoRA Inference

W&B launches Serverless LoRA Inference on CoreWeave

Weights & Biases launched Serverless LoRA Inference on CoreWeave: upload a LoRA adapter to W&B Artifacts and serve it instantly on top of any supported base model with no cold starts and no dedicated GPU instances. Alex demoed a 'Mocking SpongeBob' LoRA he trained in 25 minutes, served on a Qwen 2.5 base.

W&B Serverless LoRA Report ↗W&B LoRA Notebook ↗W&B Announcement on X ↗

🎙️ Hear our coverage →

#infrastructure #training #coding

Weights & Biases Nov 13, 2025

Dev ToolsOpen weights

W&B LEET

W&B ships LEET, an open-source terminal UI for monitoring ML runs

Weights & Biases released LEET (Lightweight Experiment Exploration Tool), an open-source terminal-native dashboard for tracking ML runs, demoed live by Dima Duev of the SDK team. It works fully offline for air-gapped HPC clusters and brings real-time metrics, system stats, and zoomable interactive charts to the terminal.

W&B announcement on X ↗W&B LEET blog post ↗W&B LEET (wandb beta leet) ↗

🎙️ Hear our coverage →

#coding #infrastructure

Amazon Web Services Nov 6, 2025

Also Released

AWS-OpenAI infrastructure partnership

AWS announces multi-year strategic infrastructure partnership with OpenAI

AWS announced a multi-year strategic infrastructure partnership with OpenAI to power ChatGPT inference, training, and agentic AI workloads. It is another sign of OpenAI spreading its compute needs across every major cloud provider, and a notable win for AWS in the frontier-AI infrastructure race.

🎙️ Hear our coverage →

#infrastructure #agents

S Sandbar Nov 6, 2025

Products & Apps

Stream / Stream Ring

Sandbar launches Stream voice assistant and Stream Ring wearable

Sandbar launched Stream, a voice-first personal assistant, alongside Stream Ring, a wearable described as a 'mouse for voice' that is now available for preorder. The pairing pushes always-available voice interaction into dedicated hardware rather than the phone.

🎙️ Hear our coverage →

#voice-ai #infrastructure #consumer-ai

October 2025

Anthropic Oct 16, 2025

New Models

Claude Haiku 4.5

Claude Haiku 4.5: fast, cheap model rivals Sonnet 4 accuracy

Anthropic released Claude Haiku 4.5, its smallest and fastest current-generation model. The show highlighted that it approaches Sonnet 4 level accuracy at a fraction of the cost and latency, making it attractive for high-volume agentic and production workloads.

X announcement ↗Official blog ↗

🎙️ Hear our coverage →

#frontier-models #infrastructure

Apple Oct 16, 2025

Products & Apps

M5 chip

Apple announces M5 chip with double the AI performance

Apple unveiled the M5 chip, claiming roughly double the AI performance of the previous generation for Apple Silicon. For local-model enthusiasts on the show, it means more on-device headroom for running and fine-tuning models on Macs.

Apple Newsroom ↗

🎙️ Hear our coverage →

#infrastructure #on-device

NVIDIA Oct 16, 2025

Products & Apps

DGX Spark

NVIDIA DGX Spark: a desktop personal supercomputer for local AI

NVIDIA started shipping DGX Spark, a desktop personal AI supercomputer aimed at prototyping and local inference. The show pointed to the LMSYS deep dive on its real-world performance, and Alex shared his own first impressions of the device.

LMSYS Blog deep dive ↗Alex's impressions on X ↗

🎙️ Hear our coverage →

#infrastructure #on-device

OpenAI Oct 16, 2025

Also Released

OpenAI x Broadcom custom accelerators

OpenAI and Broadcom to deploy 10 gigawatts of custom AI accelerators

OpenAI announced a strategic collaboration with Broadcom to co-develop and deploy 10 gigawatts of custom AI accelerators. It is another massive compute commitment in OpenAI's infrastructure buildout, this time with chips designed in-house.

Official announcement ↗

🎙️ Hear our coverage →

#infrastructure

OpenPipe (Weights & Biases) Oct 16, 2025

New Models

OpenPipe Qwen3 14B Instruct

OpenPipe Qwen3 14B Instruct lands on W&B Inference

OpenPipe, now part of Weights & Biases / CoreWeave, released a Qwen3 14B instruct model available through W&B Inference. Co-founder Kyle Corbitt joined the show to talk RL, Serverless RL, and practical agent evaluation and deployment.

W&B Inference model page ↗

🎙️ Hear our coverage →

#training #infrastructure

September 2025

NVIDIA Sep 25, 2025

Funding

NVIDIA-OpenAI $100B partnership

Nvidia commits up to $100B to OpenAI for 10GW of compute

Nvidia and OpenAI announced a letter of intent under which Nvidia would invest up to $100 billion in OpenAI as the two deploy at least 10 gigawatts of Nvidia systems for OpenAI's next-generation infrastructure. The episode's big-company segment centered on this deal as evidence that money and infrastructure, not just models, now drive the AI race.

🎙️ Hear our coverage →

#industry #infrastructure

Meta AI Sep 18, 2025

Products & Apps

Meta AI Glasses with Display

Meta Connect: new AI glasses with a display and neural control interface

At Meta Connect, Meta unveiled new AI glasses featuring a built-in display, a neural wristband control interface, and a new AI mode. The panel treats the glasses as an interface milestone, arguing the product surface for AI is shifting from apps to display-equipped wearables.

🎙️ Hear our coverage →

#infrastructure #consumer-ai #multimodal

Weights & Biases Sep 18, 2025

Major Features & Updates

Weave in W&B Workspaces

W&B brings Weave traces into Models workspaces for RL runs

Weights & Biases shipped Weave inside W&B Models workspaces, so reinforcement learning runs can now be logged and inspected with Weave trace tooling alongside training metrics. The show frames it as giving RL training 'x-ray vision' into what the model is actually doing.

X ↗W&B Docs ↗

🎙️ Hear our coverage →

#infrastructure #coding #training

July 2025

Cloudflare Jul 3, 2025

Major Features & Updates

One-Click AI Bot Blocking

Cloudflare launches one-click AI bot blocking for the web

Cloudflare announced a one-click feature letting site owners block AI scraping bots, a direct response to the economics of perpetual web scraping by AI labs. The move puts a default-off switch in front of a large share of the internet and highlights the tension between open research norms and commercial scraping.

Cloudflare announcement on X ↗

🎙️ Hear our coverage →

#infrastructure

Huawei Jul 3, 2025

New ModelsOpen weights

Pangu Pro MoE

Huawei's Pangu Pro MoE: 72B model trained entirely on Ascend NPUs

Huawei released Pangu Pro, a 72B-parameter MoE trained on its own Ascend NPUs rather than Nvidia or AMD hardware, hitting 1,528 tokens/sec and pretrained on 13T tokens. The panel framed it as the geopolitical open-model story of the week, showing how far Chinese compute stacks have advanced under sanctions.

X coverage ↗Hugging Face ↗

🎙️ Hear our coverage →

#open-source #architecture #infrastructure

May 2025

Nous Research May 15, 2025

Products & AppsOpen weights

Psyche

Nous Research launches Psyche, a decentralized cooperative-training network

Psyche is Nous Research's decentralized cooperative-training network that lets distributed participants jointly train large models over the internet. The launch includes open code on GitHub and a live dashboard tracking the first run, a 40B model called Consilience. COO Dillon Rolnick joined the show to explain the decentralized training push.

Website ↗GitHub ↗Announcement tweet ↗Consilience 40B dashboard ↗

🎙️ Hear our coverage →

#training #open-source #infrastructure

Technology Innovation Institute (TII) May 15, 2025

New ModelsOpen weights

Falcon-Edge

Falcon-Edge: ternary BitNet LLMs for edge deployment under 1GB VRAM

TII's Falcon-Edge project releases ternary BitNet LLMs (1B and 3B base models) that slash memory and compute requirements, enabling inference on less than 1GB of VRAM. Fine-tuners get pre-quantized checkpoints and a clear path to 1-bit LLMs.

Blog ↗Falcon-E-1B on Hugging Face ↗Falcon-E-3B on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #on-device #infrastructure

Meta AI May 1, 2025

APIs & Platforms

Llama API

Meta announces the Llama API at LlamaCon, powered by Groq

At LlamaCon, Meta unveiled an official Llama API for developers, with fast inference powered by Groq hardware. Zuckerberg also confirmed Llama thinking models are coming, along with a new meta.ai app with a social feed and a full-duplex voice model in the works.

AI at Meta LlamaCon announcements (X) ↗

🎙️ Hear our coverage →

#api #infrastructure

April 2025

Google DeepMind Apr 24, 2025

New ModelsOpen weights

Gemma 3 QAT

Google ships Quantization-Aware Trained Gemma 3 models for consumer GPUs

Google released Quantization-Aware Training (QAT) versions of the Gemma 3 family, dramatically cutting memory requirements while preserving quality. The 27B model drops from a hefty 54GB to just 14.1GB, and even the 1B model goes from 2GB to about half a gig, making state-of-the-art open models runnable on consumer GPUs. Wolfram took the 4B QAT model for a spin in LM Studio on the show.

27B Gemma 3 27B QAT: 54GB down to 14.1GB1B Gemma 3 1B QAT: 2GB down to ~0.5GB4B 4B QAT model tested in LM Studio

X Post ↗Blog ↗Reddit thread ↗

🎙️ Hear our coverage →

#open-source #infrastructure #on-device

Microsoft Apr 17, 2025

New ModelsOpen weights

BitNet b1.58

Microsoft releases BitNet 1.58-bit model weights on Hugging Face

Microsoft published BitNet (listed in the show notes as BitNet v1.5), its native 1.58-bit quantized LLM, as open weights on Hugging Face. The ternary-weight approach targets extremely efficient CPU inference at a fraction of the memory of standard models.

Hugging Face ↗

🎙️ Hear our coverage →

#open-source #infrastructure

CoreWeave Apr 3, 2025

Benchmarks & Evals

CoreWeave GB200 inference benchmark

CoreWeave hits 800 tok/s on Llama 405B with NVIDIA GB200 Blackwell

CoreWeave announced record-breaking AI inference benchmarks using NVIDIA's new GB200 Grace Blackwell superchips: 800 tokens/sec on Llama 3.1 405B, plus 33,000 tokens/sec on Llama 2 70B with H200s. It is a marker of how fast inference hardware is accelerating.

800 tok/s Llama 3.1 405B on GB20033,000 tok/s Llama 2 70B on H200

CoreWeave press release ↗

🎙️ Hear our coverage →

#infrastructure #benchmarks

March 2025

Arcee AI Mar 20, 2025

Products & Apps

Arcee Conductor

Arcee AI announces Conductor, an intelligent model router

Arcee AI's Lucas Atkins joined the show to announce Conductor, a model router that picks the best model (including Arcee's small specialized models) for each query. It targets cost and quality optimization by routing requests instead of sending everything to one large model.

🎙️ Hear our coverage →

#api #agents #infrastructure

Nous Research Mar 13, 2025

APIs & Platforms

Portal

Nous Research opens Portal, an inference API for Hermes models

Nous Research launched Portal, its new inference API service offering access to models like Hermes 3 Llama 70B and DeepHermes 3 8B directly via API. It marks another open-source lab standing up hosted API access to make its models more accessible.

🎙️ Hear our coverage →

#api #infrastructure

Weights & Biases Mar 6, 2025

Acquisitions

CoreWeave acquisition of Weights & Biases

Weights & Biases is acquired by CoreWeave

CoreWeave announced it is acquiring Weights & Biases, the AI developer platform and ThursdAI's home company. The deal pairs W&B's experiment tracking, Weave, and models tooling with CoreWeave's AI cloud infrastructure.

W&B Announcement ↗

🎙️ Hear our coverage →

#industry #infrastructure

February 2025

DeepSeek Feb 27, 2025

Dev ToolsOpen weights

Open Source Week infra releases

DeepSeek open-sources its infra stack during Open Source Week

DeepSeek ran its Open Source Week, releasing a series of production infrastructure repos (including FlashMLA, DeepEP, and DeepGEMM) that power its training and inference stack. The drops gave the open-source community a rare look at the low-level kernels and communication libraries behind DeepSeek's efficient frontier models.

🎙️ Hear our coverage →

#open-source #infrastructure

Inception Labs Feb 27, 2025

New Models

Mercury

Inception Labs debuts Mercury, a commercial diffusion LLM

Inception Labs announced Mercury, billed as the first commercial-scale diffusion large language model, generating text via diffusion rather than autoregressive decoding. The approach promises dramatically faster token throughput, demoed first with the Mercury Coder playground.

X ↗Try it ↗

🎙️ Hear our coverage →

#architecture #coding #infrastructure

H Hao AI Lab Feb 20, 2025

Dev ToolsOpen weights

FastVideo

Hao AI Lab's FastVideo makes HunyuanVideo 3x faster with no extra training

Hao AI Lab released FastVideo, a method that makes HunyuanVideo (HY-Video) three times faster with no additional training, using a technique called Sliding Tile Attention that outperforms even flash attention for this workload. Faster inference makes open-source video models far more practical, and it supports HY-Video LoRAs for fine-tuned applications.

🎙️ Hear our coverage →

#video-gen #infrastructure #open-source

Hugging Face Feb 20, 2025

Also ReleasedOpen weights

Ultra Scale Playbook

Hugging Face publishes the Ultra Scale Playbook for training on GPU clusters

Hugging Face released the Ultra Scale Playbook, a guide to building and scaling AI models on large GPU clusters. The team ran 4,000 scaling experiments on up to 512 GPUs to distill practical guidance for labs training big models.

Hugging Face ↗

🎙️ Hear our coverage →

#training #infrastructure #open-source

Microsoft Feb 20, 2025

Products & Apps

Majorana 1

Microsoft unveils Majorana 1 quantum chip and a new state of matter

Microsoft announced the Majorana 1 quantum chip alongside a claimed new state of matter called topological superconductivity, carving a new path for quantum computing. Alex called the announcement 'absolutely mind blowing' as a potential big deal for the future of computing.

Microsoft blog ↗

🎙️ Hear our coverage →

#research #infrastructure

January 2025

OpenAI (with SoftBank & Oracle) Jan 23, 2025

Funding

Stargate Project

Stargate Project: $500B AI infrastructure investment announced

OpenAI, SoftBank (Masayoshi Son's Vision Fund), and Oracle (Larry Ellison) announced the Stargate Project, a planned $500 billion investment in US AI infrastructure. The announcement, made alongside the White House, was framed on the show as an AI 'Manhattan Project'-scale buildout of datacenters and compute.

$500B Planned investment

OpenAI: Announcing the Stargate Project ↗

🎙️ Hear our coverage →

#infrastructure #industry