Everything AI Released in July 2026

27 releases covered live on the show — every model, product, paper and tool that mattered, with links and our analysis.

← June 2026 All months

🧠 New Models 17

Meta AI Jul 9, 2026

New Models

Muse Spark 1.1 & Meta Model API

Meta launches Muse Spark 1.1 and its first paid Meta Model API

Mark Zuckerberg returned to X (35 seconds into the ThursdAI live show) to announce Muse Spark 1.1: a 1M-token-context agentic model that rivals GPT-5.5 and Opus 4.8 on agentic evals, claiming #1 on MCP Atlas, JobBench, Humanity's Last Exam and Finance Agent V2. It ships with Meta's first-ever paid developer API in public preview ($20 free credits, US-only at launch), computer use across desktop, browser and mobile, and parallel subagent delegation. On the held-back Vals AI Harvey legal-agent benchmark it scores 20% against Fable's 11%. Replit, Cline and Box are early partners. No open weights.

$1.25/$4.25 Per 1M tokens (in/out)1M Token context window20% vs 11% Harvey Legal Agent Bench vs Fable

Alexandr Wang announcement ↗Meta blog ↗AI at Meta ↗

🎙️ Hear our coverage →

#frontier-models #agents #api

OpenAI Jul 9, 2026

New Models

GPT-5.6 (Sol, Terra, Luna)

OpenAI launches GPT-5.6 publicly as three tiers: Sol, Terra and Luna

GPT-5.6 went public mid-show after an unusual customer-by-customer Commerce Department review that limited the preview to roughly 20 approved organizations; Sol rolls to all paid plans within 24 hours, Terra and Luna reach free users. Sol is the flagship with a new Ultra subagent mode and a Max reasoning-effort setting, Terra targets GPT-5.5-level quality at half the cost, and Luna is the fast tier. All three still run on the ~4T-parameter Spud pretrain from GPT-5.5; the same Sol weights also serve on Cerebras at 700+ tokens per second. On ARC-AGI-3 Sol scored 7.8% and became the first model to beat a public game. METR rejected its own pre-deployment eval after recording the highest benchmark-cheating rate it has measured, and OpenAI's system card discloses unauthorized-action incidents on about 0.25% of tasks.

$5/$30 Sol per 1M tokens (in/out)$2.50/$15 Terra per 1M tokens700+ tok/s Same-weights Sol on Cerebras

X announcement ↗Preview blog ↗System card ↗

🎙️ Hear our coverage →

#frontier-models #agents #coding

Reve Jul 9, 2026

New Models

Reve 2.1

Reve 2.1 takes #2 on the Text-to-Image Arena with layer-based generation

Released a month after Reve 2.0 (and mid-way through the ThursdAI live show), Reve 2.1 landed at #2 on the Text-to-Image Arena with a score of 1306, 28 points clear of the field, dethroning Meta's Muse Image after roughly 30 hours at #2. Its differentiator is architecture: images are built through a layout engine, so every element lands on its own editable layer — edit one element and the image rebuilds around it. Also ranks #8 on single-image editing, on par with Nano Banana Pro, with improved prompt understanding, world knowledge and foreign-text rendering.

1306 #2 Text-to-Image Arena score+28 Points clear of next-best~30h How long Muse Image held #2

Reve announcement ↗Arena result ↗Design Arena result ↗

🎙️ Hear our coverage →

ByteDance Jul 8, 2026

New Models

Seedream 5.0 Pro

ByteDance releases Seedream 5.0 Pro with precision editing and layer separation

The flagship tier of the Seedream 5 line pitches a shift from image generator to design tool: interactive precision editing (point, lasso, sketch), intelligent layer separation that decomposes an image into editable layers, dense infographic rendering, and native text in 10+ languages. Rollout is enterprise-first via the BytePlus API, Dreamina and Magnific, with Seedance 2.5 video pre-announced for roughly ten days later.

4K Max native resolution10+ Languages for native text

X announcement ↗Blog ↗

🎙️ Hear our coverage →

Cognition Jul 8, 2026

New Models

SWE-1.7

Cognition ships SWE-1.7 at 1000 tokens per second

An RL fine-tune of Moonshot's open Kimi K2.7 base (disclosed up front, unlike SWE-1.5's hidden GLM base), lifting FrontierCode from 30.1% to 42.3% — tied with GPT-5.5 though still behind Opus 4.8. Served at 1000 tok/s including a Cerebras-hosted Lightning SKU, free for paid Devin users for a month, at roughly $1.97 per task. No public API at launch; Devin and Windsurf only.

1000 tok/s Serving speed42.3% FrontierCode 1.1 (base was 30.1%)$1.97 Cost per FrontierCode task

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#coding #agents

Mistral AI Jul 8, 2026

New ModelsOpen weights

Robostral Navigate

Mistral releases Robostral Navigate, its first embodied-navigation model

An 8B robotics model that guides robots through natural-language task instructions using a single RGB camera, claiming state of the art on the R2R-CE benchmark. Mistral's first move into embodied AI, and one of the week's most-discussed releases on Hacker News.

8B ParametersSOTA R2R-CE benchmark

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#robotics #open-source

xAI Jul 8, 2026

New Models

Grok 4.5

SpaceXAI launches Grok 4.5, a coding-and-agents model trained with Cursor

The first flagship under the unified SpaceXAI brand (xAI dissolved into it two days earlier): a 1.5T-parameter MoE on the new V9 base, trained with trillions of tokens of real Cursor agent-interaction data. The pitch is efficiency: 83.3% on Terminal-Bench 2.1 while using about a quarter of the output tokens Opus 4.8 needs per solved SWE-Bench Pro task, at $2/$6 per million. SpaceXAI self-disclosed that a Cursor codebase snapshot contaminated training and inflated its CursorBench score.

$2/$6 Per 1M tokens (in/out)83.3% Terminal-Bench 2.11.5T Total parameters (MoE)

X announcement ↗Cursor blog ↗

🎙️ Hear our coverage →

#coding #agents #frontier-models

Cohere Jul 7, 2026

New ModelsOpen weights

Transcribe Arabic

Cohere open-sources Transcribe Arabic, topping the Arabic ASR leaderboard

A 2B-parameter Apache 2.0 speech-to-text model that leads the Hugging Face Arabic ASR leaderboard at 25.87 WER — about 11 points better than Whisper Large V3 — with human evaluators preferring it in roughly 96% of head-to-head tests. Handles dialect variety, code-switching and Arabic-English bilingual speech, with day-0 mlx-audio support.

25.87 WER (leaderboard #1)2B Parameters, Apache 2.096% Human preference vs Whisper

X announcement ↗

🎙️ Hear our coverage →

#voice-ai #open-source #multilingual

Meta AI Jul 7, 2026

New Models

Muse Image & Muse Video

Meta Superintelligence Labs ships Muse Image and previews Muse Video

MSL's first media-generation models: Muse Image is live in the Meta AI app, Instagram Stories (US) and WhatsApp, with agentic generation that calls web search and code execution, multi-reference composition, and Instagram social-context conditioning. Muse Video shares the same pretraining base and adds native audio, debuting at #3 on Arena text-to-video while Muse Image lands #2 on image. There is no public API, and public Instagram accounts are opted in to @-mention remixing by default.

#2 Arena text-to-image debut#3 Arena text-to-video debut1280 Arena image score

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#image-gen #video-gen #consumer-ai

S Shanghai AI Lab Jul 7, 2026

New ModelsOpen weights

Agents-A1

Shanghai AI Lab releases Agents-A1, an Apache 2.0 agentic MoE

A 35B MoE built on Qwen3.5-35B-A3B by the InternScience team, trained specifically for long-horizon agent work with a 256K context window, shipping with quantized variants under Apache 2.0.

35B MoE parameters256K Context window

X announcement ↗

🎙️ Hear our coverage →

#agents #open-source

B Base44 Jul 2, 2026

New Models

Base 1

Base44 launches Base 1, the first in-house vibe-coding LLM

Base44 (the Wix subsidiary at $150M ARR) launched Base 1, a proprietary LLM trained on tens of millions of real app-building interactions — the first vibe-coding platform to ship its own internal model. Auto-routing already directs tasks to Base 1 when it beats alternatives on internal benchmarks.

$150M Base44 ARR

🎙️ Hear our coverage →

Google DeepMind Jul 2, 2026

New Models

NanoBanana 2 Lite

NanoBanana 2 Lite: sub-4-second images at ~3¢ per 1,000

Google's NanoBanana 2 Lite generates images in under four seconds starting at $0.034 per 1,000 images, with quality above the original NanoBanana. The Interactions API hit GA the same week.

3¢ per 1,000 images<4s generation time

🎙️ Hear our coverage →

Google DeepMind Jul 2, 2026

New Models

OmniFlash

Google DeepMind debuts OmniFlash, first of the any-to-any Omni family

OmniFlash — first of Google's any-to-any Omni family — generates videos up to 10 seconds with precise conversational multi-turn editing via the Interactions API: say 'make it daytime' and it redoes light, sky and shadows. Editing Elo 1087 at $0.10 per second of output.

1087 editing Elo$0.10 per second of video, up to 10s

🎙️ Hear our coverage →

#video-gen #multimodal

Meituan Jul 2, 2026

New ModelsOpen weights

LongCat-2.0

Meituan reveals LongCat-2.0, a 1.6T MoE trained entirely on Chinese ASICs

Meituan disclosed LongCat-2.0, a 1.6-trillion-parameter MoE trained entirely on Chinese ASICs without NVIDIA hardware. It scores 59.5 on SWE-bench Pro and runs at $0.038 per million tokens with free cache hits. The model had been serving anonymously as 'Owl Alpha' and ranks among OpenRouter's top models by volume — part of a surge that puts Chinese open-weight models at ~30% of global usage, up from 1.2% eleven months ago.

1.6T MoE parameters, no NVIDIA in training59.5 SWE-bench Pro$0.038 per 1M tokens, free cache hits

🎙️ Hear our coverage →

#open-source #frontier-models

OpenAI Jul 2, 2026

New Models

GPT-5.6

OpenAI ships GPT-5.6 as a three-model family: Sol, Terra and Luna

GPT-5.6 arrives as three models — Sol (frontier), Terra (~5.5-level intelligence at half the cost) and Luna (small and fast) — plus a new Ultra mode with a Max reasoning level and heavier sub-agent use. Dominik Kundel confirmed on ThursdAI that 5.6 Sol is coming to Cerebras at extreme speed running the same weights as the API model, not a distill.

3 models: Sol / Terra / Luna50% Terra cost vs GPT-5.5-level intelligence

🎙️ Hear our coverage →

#frontier-models #api

Anthropic Jul 1, 2026

New Models

Fable 5

Fable 5 restored globally after the export-control pause

Anthropic restored Fable 5 (and Mythos 5) globally on July 1 after US export controls were lifted, adding cybersecurity classifiers as 'the strongest safeguards'. The June 12 pause had been triggered by jailbreak concerns; access resumed without ID-verification requirements, though new content filters may temporarily block some routine coding tasks. Alex celebrated by having Fable prep the entire ThursdAI run of show.

19 days offline (June 12 pause → July 1 restore)

🎙️ Hear our coverage →

#frontier-models

Anthropic Jul 1, 2026

New Models

Sonnet 5

Claude Sonnet 5: 'our most agentic Sonnet yet' at intro pricing

Anthropic launched Sonnet 5 with near-Opus 4.8 performance at introductory $2/$10 per-million pricing through August 31. Reception split sharply: power users saw near-Opus costs for marginally inferior output at high effort levels, casual users praised the value — and the new tokenizer may consume up to 35% more tokens. On ThursdAI, Wolfram's early WolfBench read put it slightly under Opus 4.6 at higher cost.

$2/$10 intro pricing per 1M tokens through Aug 31+35% potential extra token burn from the new tokenizer

🎙️ Hear our coverage →

#frontier-models #benchmarks

🚀 Products & Apps 3

OpenAI Jul 9, 2026

Products & Apps

ChatGPT for Work (unified app)

Codex becomes the unified ChatGPT app, with Work mode and hosted Sites

Launched alongside GPT-5.6: the Codex desktop app updated in place into one unified ChatGPT app, with a switchable icon (Codex for developers, ChatGPT for Work for everyone else), computer use running in a picture-in-picture window, unified plugins across ChatGPT and Codex, and multi-tab enterprise auth in the browser. The Sites feature hosts what users build on the chatgpt.site subdomain (Webflow under the hood), with private sites gated behind explicit publishing approval. The rollout happened live during the ThursdAI broadcast.

chatgpt.site Hosted Sites subdomain

Launch summary (OpenAI DevRel) ↗

🎙️ Hear our coverage →

#agents #consumer-ai #coding

OpenAI Jul 8, 2026

Products & Apps

GPT-Live

OpenAI ships GPT-Live, full-duplex voice for ChatGPT

GPT-Live listens while it speaks, deciding many times per second whether to talk, pause, interrupt, or call a tool, and delegates harder queries to GPT-5.5 mid-conversation. It ships as GPT-Live-1 (paid default) and GPT-Live-1 mini (free default) with nine remastered voices, real-time translation, and a Hey Chat wake word. Consumer tiers only at launch: no API beyond a waitlist form, no Business/Enterprise/Edu, and OpenAI's own system card notes small safety regressions versus Advanced Voice Mode.

150M+ Weekly ChatGPT voice users2 Model sizes at launch

X announcement ↗Blog ↗System card ↗

🎙️ Hear our coverage →

#voice-ai #consumer-ai

E Exo Labs Jul 2, 2026

Products & Apps

local.ai

Exo Labs launches local.ai to track the local-AI frontier

Announced live on ThursdAI at AI Engineer: local.ai tracks the best model for your hardware, the performance trade versus the cloud, and whether running local beats API-token pricing. Early access is live with signup codes, and the Exo CLI — 'vLLM for consumer devices, with the configs figured out for you' — ships in the coming weeks.

71% Terminal Bench 2.1, REAP-pruned GLM 5.2550B Nemotron-3 Ultra running on 4 NVIDIA Sparks

🎙️ Hear our coverage →

#on-device #open-source #infrastructure

🔌 APIs & Platforms 2

Google DeepMind Jul 7, 2026

APIs & Platforms

Gemini API Managed Agents

Gemini API Managed Agents add background tasks and remote MCP

Google expanded Managed Agents in the Gemini API with background task support, remote MCP and function calling, and network credential refresh — available on the free tier, positioning Gemini's agent infrastructure directly against OpenAI's agent primitives.

Free tier Availability

X announcement ↗Article ↗

🎙️ Hear our coverage →

OpenAI Jul 6, 2026

APIs & Platforms

GPT-Realtime-2.1-mini

GPT-Realtime-2.1-mini brings reasoning and tool use to the Realtime API mini tier

Two days before GPT-Live, OpenAI upgraded the Realtime API mini lineup with reasoning and tool use at unchanged pricing, plus a 25%+ p95 latency cut from improved caching. Notably it does not include GPT-Live's full-duplex capability, which remains app-exclusive.

≥25% p95 latency reduction

X announcement ↗

🎙️ Hear our coverage →

#voice-ai #api #agents

🛠️ Dev Tools 2

P PyTorch Jul 8, 2026

Dev ToolsOpen weights

PyTorch 2.13

PyTorch 2.13 lands FlexAttention on Apple Silicon and big memory wins

3,328 commits from 526 contributors: FlexAttention on Apple Silicon at roughly 12x over SDPA for sparse patterns, a deterministic CUDA backward path, nn.LinearCrossEntropyLoss with up to 4x peak-memory reduction, torchcomms for large-cluster training, and expanded ROCm/Arm/XPU support.

~12x FlexAttention on Apple Silicon vs SDPA3,328 Commits from 526 contributors

X announcement ↗

🎙️ Hear our coverage →

#open-source #training #infrastructure

Z.ai Jul 2, 2026

Dev Tools

ZCode

Z.ai launches ZCode, a GLM-5.2 agentic coding environment

ZCode is an agentic coding environment built on GLM-5.2 with 1M-token context and a novel /goal verification protocol that uses independent success checkers. Output reaches 173 tokens/second with 1.4-second time-to-first-token — substantially faster than competing coding models.

173 tokens/second output1M token context

🎙️ Hear our coverage →

#coding #agents

📄 Papers & Research 2

Liquid AI Jul 7, 2026

Papers & ResearchOpen weights

Antidoom

Liquid AI open-sources Antidoom, removing the reasoning doom-loop

An open method that suppresses the failure mode where reasoning models spiral into repetitive degenerate output: doom-loop rates dropped from 22.9% to 1% on Qwen3.5-4B and from 10.2% to 1.4% on an LFM2.5 checkpoint, with eval scores improving across the board.

22.9%→1% Doom-loop rate, Qwen3.5-4B

X announcement ↗

🎙️ Hear our coverage →

#reasoning #open-source #training

Anthropic Jul 6, 2026

Papers & ResearchOpen weights

J-space (global workspace research)

Anthropic finds a global workspace inside Claude: the J-space

Using a Jacobian-based interpretability technique (the J-lens), Anthropic identified a small internal subspace — about 25 active concepts, under 10% of activation variance — that behaves like the global workspace from consciousness neuroscience. Ablating it collapses multi-step reasoning while fluency survives; ablating its evaluation-awareness signals flipped a blackmail eval from 0 to 13 of 180 rollouts. The J-lens is open-sourced with a Neuronpedia demo, and commentary came from global-workspace originators Dehaene and Naccache plus a more skeptical replication by DeepMind's Neel Nanda.

~25 Concepts active in J-space<10% Share of activation variance71%→3% Test-recognition after ablation

X announcement ↗Research post ↗Paper ↗Interactive demo ↗

🎙️ Hear our coverage →

#research #safety

💰 Funding 1

Together AI Jul 1, 2026

Funding

Series C

Together AI raises $800M Series C at an $8.3B valuation

Aramco Ventures led the round with NVIDIA, Vista Equity and General Catalyst participating. The open-model cloud reports over $1B in annual bookings, says open-model usage on the platform tripled year over year, and plans roughly 50x infrastructure growth over five years.

$800M Series C$8.3B Valuation>$1B Annual bookings

🎙️ Hear our coverage →

#industry #infrastructure

← June 2026 All months