APIs & Platforms

Model APIs, developer platforms, pricing, and model routing. — 38 releases covered on the show.

July 2026

Meta AI Jul 9, 2026

New Models

Muse Spark 1.1 & Meta Model API

Meta launches Muse Spark 1.1 and its first paid Meta Model API

Mark Zuckerberg returned to X (35 seconds into the ThursdAI live show) to announce Muse Spark 1.1: a 1M-token-context agentic model that rivals GPT-5.5 and Opus 4.8 on agentic evals, claiming #1 on MCP Atlas, JobBench, Humanity's Last Exam and Finance Agent V2. It ships with Meta's first-ever paid developer API in public preview ($20 free credits, US-only at launch), computer use across desktop, browser and mobile, and parallel subagent delegation. On the held-back Vals AI Harvey legal-agent benchmark it scores 20% against Fable's 11%. Replit, Cline and Box are early partners. No open weights.

$1.25/$4.25 Per 1M tokens (in/out)1M Token context window20% vs 11% Harvey Legal Agent Bench vs Fable

Alexandr Wang announcement ↗Meta blog ↗AI at Meta ↗

🎙️ Hear our coverage →

#frontier-models #agents #api

Google DeepMind Jul 7, 2026

APIs & Platforms

Gemini API Managed Agents

Gemini API Managed Agents add background tasks and remote MCP

Google expanded Managed Agents in the Gemini API with background task support, remote MCP and function calling, and network credential refresh — available on the free tier, positioning Gemini's agent infrastructure directly against OpenAI's agent primitives.

Free tier Availability

X announcement ↗Article ↗

🎙️ Hear our coverage →

OpenAI Jul 6, 2026

APIs & Platforms

GPT-Realtime-2.1-mini

GPT-Realtime-2.1-mini brings reasoning and tool use to the Realtime API mini tier

Two days before GPT-Live, OpenAI upgraded the Realtime API mini lineup with reasoning and tool use at unchanged pricing, plus a 25%+ p95 latency cut from improved caching. Notably it does not include GPT-Live's full-duplex capability, which remains app-exclusive.

≥25% p95 latency reduction

X announcement ↗

🎙️ Hear our coverage →

#voice-ai #api #agents

OpenAI Jul 2, 2026

New Models

GPT-5.6

OpenAI ships GPT-5.6 as a three-model family: Sol, Terra and Luna

GPT-5.6 arrives as three models — Sol (frontier), Terra (~5.5-level intelligence at half the cost) and Luna (small and fast) — plus a new Ultra mode with a Max reasoning level and heavier sub-agent use. Dominik Kundel confirmed on ThursdAI that 5.6 Sol is coming to Cerebras at extreme speed running the same weights as the API model, not a distill.

3 models: Sol / Terra / Luna50% Terra cost vs GPT-5.5-level intelligence

🎙️ Hear our coverage →

#frontier-models #api

June 2026

Sakana AI Jun 25, 2026

Dev Tools

Fugu

Sakana AI launches Fugu multi-agent orchestration API

Announced on air by Stefania Druga: the Fugu recursive router — it rewrites prompts and verifies outputs before picking a model, per the two ICLR papers behind it (Trinity and the conductor) — now plugs into Codex and OpenCode.

95.5 GPQA Diamond93.2 LiveCodeBench73.7 SWE-Bench Pro

Fugu announcement ↗Sakana launch tweet ↗

🎙️ Hear our coverage (+1 follow-up) →

#agents #benchmarks #api

OpenRouter Jun 18, 2026

APIs & Platforms

Fusion API

OpenRouter launches Fusion API, a panel of budget models competing with frontier models

OpenRouter launched Fusion API, which routes or ensembles a panel of lower-cost models to reach near-frontier results. The episode notes framed it as beating GPT-5.5 and Opus 4.8 in some comparisons while landing within roughly 1% of Claude Fable 5 at half the price.

~1% from Fable 5 in episode notes

OpenRouter announcement on X ↗Fusion beats frontier models ↗OpenRouter Fusion ↗

🎙️ Hear our coverage →

#api #frontier-models #benchmarks

Weights & Biases / CoreWeave Jun 18, 2026

APIs & Platforms

Kimi K2.7 Code on CoreWeave Inference

Kimi K2.7 Code goes live on W&B/CoreWeave Inference

Kimi K2.7 Code became available on W&B/CoreWeave Inference, with the episode notes calling out Blackwell NVFP4 serving, speculative decoding, and 289 tokens per second near the top of Artificial Analysis speed and price-performance charts.

289 tok/s reported throughput

CoreWeave announcement ↗Try Kimi K2.7 Code on W&B/CoreWeave Inference ↗

🎙️ Hear our coverage →

#api #infrastructure #coding

May 2026

Anthropic May 21, 2026

Major Features & Updates

Claude off-peak usage boost

Anthropic doubles Claude usage limits outside peak hours for a limited time

Anthropic doubled Claude usage outside peak hours for a limited period, covering Claude Code and other Claude surfaces. The move gives heavy users substantially more agentic and coding throughput during off-peak windows.

Claude on X ↗

🎙️ Hear our coverage →

Google DeepMind May 21, 2026

APIs & Platforms

Managed Agents (Gemini API)

Gemini API gets Managed Agents with hosted sandboxes and the Interactions API

Google launched Managed Agents in the Gemini API, letting developers spin up hosted Antigravity agents with Linux sandboxes and persistent state. It ships alongside the next-generation Interactions API, which Logan Kilpatrick described as designed for agentic systems rather than the old tokens-in, tokens-out model interaction pattern.

Gemini API agents docs ↗Google AI Developers on X ↗

🎙️ Hear our coverage →

#agents #api #coding

April 2026

Amazon Web Services Apr 30, 2026

APIs & Platforms

GPT-5.5 and Codex on Bedrock

AWS brings GPT-5.5 and Codex to Bedrock as Azure exclusivity ends

AWS announced GPT-5.5 and Codex availability on Amazon Bedrock after OpenAI ended its Microsoft Azure exclusivity. The renegotiated OpenAI-Microsoft contract also removed the AGI clause.

Sam Altman tweet ↗

🎙️ Hear our coverage →

#infrastructure #api #frontier-models

Alibaba (Qwen) Apr 23, 2026

APIs & Platforms

Qwen3.6-Max-Preview

Qwen3.6-Max-Preview goes live on API

Alongside the open-weights 27B release, Alibaba put Qwen3.6-Max-Preview live on its API. It is the frontier closed-weights tier of the Qwen3.6 family, available API-only rather than as open weights.

Qwen3.6-Max-Preview on API ↗

🎙️ Hear our coverage →

#frontier-models #api

March 2026

xAI Mar 19, 2026

APIs & Platforms

Grok Text-to-Speech API

xAI launches Grok TTS API with 5 voices and WebSocket streaming

xAI launched a Grok Text-to-Speech API with five voices, expressive controls, and WebSocket streaming, priced cheaper than ElevenLabs. It adds another option to a suddenly competitive voice AI market alongside open-source entrants like Fish Audio S2.

xAI on X ↗Grok voice API ↗Try text-to-speech ↗

🎙️ Hear our coverage →

February 2026

OpenAI Feb 26, 2026

New Models

gpt-audio-1.5 & gpt-realtime-1.5

OpenAI releases gpt-audio-1.5 and gpt-realtime-1.5

OpenAI shipped gpt-audio-1.5 and gpt-realtime-1.5, updated audio and realtime voice models available through its platform. The release was covered in the week's voice and audio roundup.

Release noted on X ↗OpenAI models docs ↗

🎙️ Hear our coverage →

#voice-ai #audio #api

Weights & Biases Feb 26, 2026

Major Features & Updates

W&B Inference: MiniMax 2.5 & Kimi K2.5

W&B Inference adds MiniMax 2.5 and Kimi K2.5

Weights & Biases added MiniMax M2.5 and Kimi K2.5 to its CoreWeave-backed Inference service. The panel emphasized price/performance, with MiniMax 2.5 presented as roughly 10x cheaper than premium alternatives in some tiers and Kimi K2.5 praised for practical function calling and image-in-loop use cases.

MiniMax M2.5 on W&B Inference ↗

🎙️ Hear our coverage →

#infrastructure #api #open-source

xAI Feb 19, 2026

New Models

Grok 4.20

xAI silently drops Grok 4.20 with four 500B-param collaborating agents

xAI released Grok 4.20, a multi-agent system where four 500B-parameter agents collaborate in a multi-agent UI, with a $300/month Heavy tier scaling to 16 agents. No benchmarks or evals were released with the drop. The panel found it underwhelming for coding and day-to-day agent work but still top tier for deep research thanks to xAI's RAG over X data; Grok 4.1 Fast remains #8 on OpenRouter by API usage.

500B×4 Grok 4 20 Architecture

Grok 4.20 on X ↗xAI model docs ↗

🎙️ Hear our coverage (+1 follow-up) →

#agents #frontier-models #search

OpenAI Feb 5, 2026

New Models

GPT-5.3-Codex

OpenAI answers Opus with GPT-5.3-Codex, first model that helped build itself

One hour after Opus 4.6, OpenAI released GPT-5.3-Codex, billed as the first model instrumental in developing itself — the Codex team used early versions to debug its own training and manage its own deployment. It scores 73% on Terminal Bench 2.0, a 10-point gap over Opus 4.6, while running queries 25% faster and more token-efficiently than its predecessor, with improved mid-task steerability.

73% Terminal Bench 2.025% Speed improvement

Sam Altman announcement on X ↗OpenAIDevs announcement on X ↗GPT-5.3-Codex model docs ↗

🎙️ Hear our coverage (+1 follow-up) →

#frontier-models #coding #agents

January 2026

xAI Jan 29, 2026

APIs & Platforms

Grok Imagine API

xAI launches Grok Imagine API with video generation

xAI released the Grok Imagine API, exposing its image and video generation capabilities to developers through the xAI console. The show subtitle notes Grok Imagine ranking #1 among generation models this week.

Announcement (X) ↗xAI Console ↗

🎙️ Hear our coverage →

#video-gen #image-gen #api

December 2025

OpenAI Dec 18, 2025

Products & Apps

ChatGPT App Store

ChatGPT App Store opens submissions via MCP app model

OpenAI opened app submissions for the ChatGPT App Store, built on the MCP-powered apps model. Developers can now submit apps that run inside ChatGPT, signaling OpenAI's platform play for distribution of agentic apps.

ChatGPT Apps submission ↗

🎙️ Hear our coverage →

xAI Dec 18, 2025

APIs & Platforms

Grok Voice Agent API

xAI Grok Voice Agent API ships at $0.05/min flat rate, powers Tesla

xAI launched the Grok Voice Agent API with flat-rate pricing of $0.05 per minute and integration into Tesla vehicles. xAI claims the #1 spot on Big Bench Audio at 92.3%, tightening competition in the rapidly commoditizing real-time voice stack.

$0.05/min Grok Voice Agent API

xAI Grok Voice Agent API ↗

🎙️ Hear our coverage →

#voice-ai #agents #api

November 2025

xAI Nov 20, 2025

APIs & Platforms

Grok 4.1 Fast + Agent Tools API

Grok 4.1 Fast: 2M context and Agent Tools API at 10x lower cost

Launched as breaking news during the show, Grok 4.1 Fast pairs a 2 million token context window with a new Agent Tools API offering native X search, Reddit search, web browsing, and code execution. Benchmarks are striking: 93-100% on tau2-Bench Telecom and 72% on Berkeley Function Calling v4 (top of the leaderboard) at $0.20/$0.50 per million tokens — roughly 10x cheaper than competitors, and free for the first two weeks on the xAI API and OpenRouter.

93–100% τ²-Bench Telecom72% Berkeley Function Calling v42M Token context window

🎙️ Hear our coverage →

September 2025

OpenAI Sep 4, 2025

New Models

gpt-realtime

OpenAI ships gpt-realtime and takes the Realtime API to GA

OpenAI shipped the gpt-realtime speech-to-speech model and moved the Realtime API to general availability. The GA release adds remote MCP tool support, image input, and SIP phone calling, making it a full production stack for voice agents and tying into the episode's voice-agents discussion with Kwindla Kramer.

🎙️ Hear our coverage →

#voice-ai #api #agents

May 2025

Mistral AI May 29, 2025

APIs & Platforms

Mistral Agents API

Mistral launches Agents API for building tool-using agents

Mistral released an Agents API, a framework for building custom tool-using agents on top of Mistral models. It joins the wave of big-lab agent frameworks, letting developers wire up tools and orchestrate agentic workflows through Mistral's platform.

Blog ↗Tweet ↗

🎙️ Hear our coverage →

Mistral AI May 29, 2025

APIs & Platforms

Mistral Embed

Mistral ships new state-of-the-art embedding API

Mistral announced a new state-of-the-art embedding API. The release gives developers a SOTA option for retrieval and semantic search workloads served through Mistral's platform.

X announcement ↗

🎙️ Hear our coverage →

Anthropic May 15, 2025

APIs & Platforms

Web Search API

Anthropic launches Web Search API for real-time retrieval in Claude

Anthropic released a Web Search API that gives Claude models real-time web retrieval, letting developers ground responses in current information directly through the API. It was covered among the week's big-company API updates.

🎙️ Hear our coverage →

#api #search #agents

Meta AI May 1, 2025

APIs & Platforms

Llama API

Meta announces the Llama API at LlamaCon, powered by Groq

At LlamaCon, Meta unveiled an official Llama API for developers, with fast inference powered by Groq hardware. Zuckerberg also confirmed Llama thinking models are coming, along with a new meta.ai app with a social feed and a full-duplex voice model in the works.

AI at Meta LlamaCon announcements (X) ↗

🎙️ Hear our coverage →

#api #infrastructure

April 2025

OpenAI Apr 24, 2025

APIs & Platforms

gpt-image-1

OpenAI's GPT Image generation lands in the API as gpt-image-1

OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available to developers via API under the official name gpt-image-1. This was the big one many developers were waiting for, opening up the viral image generation and editing capabilities for building AI art and image editing applications.

X Post ↗Docs ↗API Reference ↗

🎙️ Hear our coverage →

#image-gen #api

Google DeepMind Apr 17, 2025

New Models

Gemini 2.5 Flash

Google launches Gemini 2.5 Flash with controllable thinking budgets

Google answered OpenAI's launch week with Gemini 2.5 Flash, a fast reasoning model that introduces controllable thinking budgets so developers can dial how much the model reasons per request. It is available through the Gemini API and developer platform.

Blog Post ↗API Docs ↗

🎙️ Hear our coverage (+1 follow-up) →

#reasoning #frontier-models #api

Mistral AI Apr 17, 2025

Products & Apps

Classifiers Factory

Mistral releases Classifiers Factory

Mistral announced Classifiers Factory, a service for building and training custom text classifiers on its platform. Covered as a quick item in the Big CO LLMs + APIs section of the show.

🎙️ Hear our coverage →

#on-device #api

Anthropic Apr 10, 2025

Products & Apps

Claude Max plan

Anthropic launches Max plan at $200/mo with higher usage quotas

Anthropic introduced a new Max subscription tier priced at $200 per month, offering significantly more usage quota than the standard Pro plan. It mirrors OpenAI's Pro-tier pricing strategy for power users.

$200/mo Max plan price

🎙️ Hear our coverage →

#api #consumer-ai

xAI Apr 10, 2025

APIs & Platforms

Grok 3 API

xAI finally launches the Grok 3 API tier

xAI made Grok 3 and Grok 3 Mini available via API, giving developers programmatic access to its frontier models for the first time. The Grok app also received updates the same week.

xAI API models and pricing ↗API Docs ↗App Update X Post ↗

🎙️ Hear our coverage (+1 follow-up) →

#api #frontier-models

March 2025

Arcee AI Mar 20, 2025

Products & Apps

Arcee Conductor

Arcee AI announces Conductor, an intelligent model router

Arcee AI's Lucas Atkins joined the show to announce Conductor, a model router that picks the best model (including Arcee's small specialized models) for each query. It targets cost and quality optimization by routing requests instead of sending everything to one large model.

🎙️ Hear our coverage →

#api #agents #infrastructure

OpenAI Mar 20, 2025

APIs & Platforms

o1-pro API

OpenAI makes o1-pro available via API at $600 per 1M output tokens

OpenAI exposed its o1-pro reasoning model through the API for the first time, priced at $600 per million output tokens. The show jokingly framed the pricing as 'for oligarchs', but it makes OpenAI's highest-compute reasoning tier programmatically accessible.

🎙️ Hear our coverage →

#reasoning #api

Nous Research Mar 13, 2025

APIs & Platforms

Portal

Nous Research opens Portal, an inference API for Hermes models

Nous Research launched Portal, its new inference API service offering access to models like Hermes 3 Llama 70B and DeepHermes 3 8B directly via API. It marks another open-source lab standing up hosted API access to make its models more accessible.

🎙️ Hear our coverage →

#api #infrastructure

OpenAI Mar 13, 2025

APIs & Platforms

Responses API + Web Search, File Search, Computer Use tools

OpenAI launches Responses API with Web Search, File Search, and Computer Use

OpenAI announced a new agent-focused developer stack at a livestream: the Responses API, a new way to build with OpenAI designed for agentic workloads, plus an Agents SDK. It ships with three built-in tools: Web Search, a File Search tool providing built-in RAG over your files, and a Computer Use tool for agents that operate computer interfaces.

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#agents #api #coding

Mistral AI Mar 6, 2025

APIs & Platforms

Mistral OCR

Mistral announces state-of-the-art OCR API

Mistral AI announced Mistral OCR, a document-understanding API the company claims is state of the art at extracting text, tables, and equations from complex documents. It targets RAG and document-processing pipelines with structured markdown output.

🎙️ Hear our coverage →

February 2025

Google DeepMind Feb 27, 2025

APIs & Platforms

Veo 2 (via FAL API)

Google's Veo 2 video model becomes available via FAL API

Google DeepMind's Veo 2 video generation model became accessible to developers through FAL's inference API. This was the first broadly available API access to Veo 2, letting builders generate high-quality video from text prompts without waiting on Google's own product surfaces.

🎙️ Hear our coverage →

#video-gen #api

January 2025

Anthropic Jan 23, 2025

APIs & Platforms

Citations (Claude API)

Anthropic adds Citations to the Claude API

Anthropic launched a Citations capability in the Claude API, letting Claude ground its answers in provided source documents and return precise citations. It targets RAG and document-QA use cases where verifiable sourcing matters.

Anthropic Citations docs ↗

🎙️ Hear our coverage →

Perplexity Jan 23, 2025

APIs & Platforms

Sonar Pro Search API

Perplexity ships Sonar Pro search API and an Android AI assistant

Perplexity released its Sonar Pro search-grounded API, giving developers programmatic access to Perplexity-style web-grounded answers, and also launched an AI assistant for Android. Two shipping moves that push Perplexity beyond its consumer answer engine.

Perplexity announcement on X ↗

🎙️ Hear our coverage →

#search #api #consumer-ai