June 2026 saw 31 AI launches covered on ThursdAI, led by NVIDIA, Anthropic, Microsoft, OpenAI across agents, coding, open source, industry. This page collects the products, source links, key numbers and episode coverage in one crawlable recap.
Moonshot AI open-sources Kimi K2.7 Code for agentic coding
Moonshot AI open-sourced Kimi K2.7 Code, a trillion-parameter MoE coding model with benchmark jumps over K2.6 and fewer reasoning tokens. On the show it landed as the second half of the open-source coding wave beside GLM-5.2.
xAI launches Grok Imagine Video 1.5 with faster generation and native audio
xAI launched Grok Imagine Video 1.5 with nearly 2x faster generation, native audio, and a claimed #1 leaderboard position. The episode grouped it with Gemini Omni as part of the week’s video-generation frontier.
Z.ai releases GLM-5.2, a 753B open MoE with 1M context
Z.ai released GLM-5.2 as a major open-source coding and agentic model: a 753B-parameter MoE, MIT-licensed, with a one-million-token context window. The episode treated it as the open-source model that arrived exactly as Fable access disappeared, with strong coding and agentic performance close to the frontier.
Google drops Gemma 4 12B, an encoder-free multimodal local model
Google released Gemma 4 12B, an encoder-free multimodal model under Apache 2.0 that targets 16GB VRAM local setups. Instead of bolting separate vision or audio encoders onto a language model, it uses one unified network, which LDJ and Yam argued makes smaller multimodal models cheaper, cleaner, and easier to run locally.
H Company launches Holo 3.1 local computer-use agent models
H Company released Holo 3.1, a family of local computer-use agent models ranging from 0.8B to 35B parameters with new quantized checkpoints. The lineup targets running screen-driving agents on local hardware rather than in the cloud.
Ideogram 4.0 becomes the top open-weight text-to-image model
Ideogram released Ideogram 4.0, a 9.3B-parameter text-to-image model with open weights under a non-commercial license. It leads open-weight image models on typography and layout, with bounding-box/layout-style prompting that trades casual generation ease for precise structured control.
JetBrains open-sources Mellum 2, a 12B MoE coding model
JetBrains released Mellum 2, a 12B mixture-of-experts coding model with only 2.5B active parameters, trained from scratch by a small team using a three-stage curriculum over 10T tokens. The panel read it as IDE companies converting years of developer-workflow context into model advantage; it is also available on CoreWeave Inference.
Microsoft ships MAI-Code-1-Flash into GitHub Copilot
Part of the seven-model MAI launch at Build 2026, MAI-Code-1-Flash is Microsoft AI's fast coding model and ships directly into GitHub Copilot. The panel saw it as a sign Microsoft intends to serve its own models inside its developer surfaces instead of relying solely on OpenAI.
Microsoft launches MAI-Thinking-1, a 1T MoE trained from scratch
Microsoft AI used Build 2026 to launch seven MAI models, headlined by MAI-Thinking-1, a 1T total, 35B active MoE reasoning model trained from scratch on 33T tokens without distillation. The panel read the launch as Microsoft becoming a frontier model lab in its own right rather than only an OpenAI distribution channel.
1T MAI Thinking 1 total parameters33T MAI training tokens
MiniMax announces M3 coding/agentic model with 1M context
MiniMax announced M3, a natively multimodal coding and agentic model with a one-million-token sparse attention context claim and open weights promised soon. Reported numbers include 59 on SWE-bench Pro, and the panel noted MiniMax already has a following for cheap agentic tool calling even as pure coding quality is debated.
NVIDIA ships Nemotron 3.5 ASR, a 600M streaming speech model
NVIDIA released Nemotron 3.5 ASR, a 600M-parameter open multilingual streaming speech-to-text model aimed at voice agents. It supports 40 languages and reportedly delivers 17x more throughput than Parakeet-style baselines at half the size, pushing the latency/accuracy frontier for open voice-agent infrastructure.
NVIDIA releases Nemotron 3 Ultra, a 550B open-weight MoE for agents
NVIDIA dropped Nemotron 3 Ultra the day of the show, a 550B-parameter sparse MoE with 55B active parameters built for long-running agentic harnesses like OpenCode, Hermes, and OpenClaw. Chris Alexiuk joined to explain the hybrid Mamba/Transformer architecture and the unusually complete open release: weights, training data, recipes, a GenRM reward model, and an NVFP4 quantized checkpoint.
550B Nemotron 3 Ultra parameters55B Active parameters
Reve 2.0 hits #2 on Text-to-Image Arena with layout-first editing
Reve 2.0 jumped to second place on Text-to-Image Arena (around 1200 ELO) with native 4K output, code-like layout control, and precise editing. Alex's live tests found inconsistent portrait identity, but the layout-first editor is the real differentiator for graphic and image iteration workflows.
xAI releases Grok Imagine Video 1.5 Preview with synced audio
xAI released a preview of Grok Imagine Video 1.5, an image-to-video model that generates clips with synchronized audio. It adds xAI to the week's crowded race of media-generation model updates.
OpenAI unveils Jalapeno custom inference chip with Broadcom
OpenAI unveiled Jalapeno, its first custom inference ASIC built with Broadcom, positioning it as part of a full-stack strategy to make ChatGPT, Codex, API, and agent workloads cheaper and faster at scale.
Midjourney announces Midjourney Medical, a full-body ultrasonic scanner concept
Midjourney announced Midjourney Medical, a full-body ultrasound scanner concept that the episode described as capturing 806TB per scan in under 60 seconds. The panel treated it as a striking sign that AI-native companies are moving beyond chatbots into hardware, imaging, and healthcare infrastructure.
Cognition rebrands Windsurf into Devin Desktop multi-agent hub
Cognition rebranded Windsurf into Devin Desktop, a multi-agent command center with Agent Client Protocol (ACP) support. The move consolidates Cognition's IDE acquisition into its Devin agent brand as a desktop control surface for running multiple coding agents.
Nous Research launches Hermes Desktop agent app for Mac/Win/Linux
Nous Research launched Hermes Desktop, packaging the Hermes Agent harness into a native desktop app for Mac, Windows, and Linux. Karan previewed chat, permissions, tool-call visibility, reasoning traces, and admin controls aimed at small teams, startups, and personal agent fleets.
NVIDIA announces RTX Spark Arm + Blackwell platform for local AI PCs
At Computex, NVIDIA unveiled RTX Spark, an Arm CPU plus Blackwell GPU PC platform with 128GB unified memory targeting local AI agents and 120B-class local inference. A wave of thin laptops with RTX 5070-class GPUs and roughly one petaflop of local AI compute raises the question of what agents should run locally versus in the cloud.
OpenAI rolls out Codex Computer Use, Chrome extension, Memory and Chronicle to European users
OpenAI rolled out Codex Computer Use plus Chrome extension, Memory, and Chronicle access to users in the EEA, UK, and Switzerland. The episode covered it as part of the week’s coding-agent platform expansion.
WolfBench adds 3D token-depth bars to show model efficiency
Wolfram Ravenwolf shipped a WolfBench feature that visualizes token usage alongside benchmark score as 3D token-depth bars. Two models can look close on a leaderboard while one burns dramatically more tokens, which changes the real cost and latency story; Gemini 3.5 Flash and GPT 5.5 were compared as examples.
OpenRouter launches Fusion API, a panel of budget models competing with frontier models
OpenRouter launched Fusion API, which routes or ensembles a panel of lower-cost models to reach near-frontier results. The episode notes framed it as beating GPT-5.5 and Opus 4.8 in some comparisons while landing within roughly 1% of Claude Fable 5 at half the price.
Kimi K2.7 Code goes live on W&B/CoreWeave Inference
Kimi K2.7 Code became available on W&B/CoreWeave Inference, with the episode notes calling out Blackwell NVFP4 serving, speculative decoding, and 289 tokens per second near the top of Artificial Analysis speed and price-performance charts.
Anthropic launches Claude Tag as a persistent Slack teammate
Claude Tag brings Claude into Slack as a persistent proactive teammate with shared channel context, ambient follow-up, coding tasks, analysis, incident support, and enterprise governance.
65% Anthropic product-team code from internal version$25K Enterprise launch credits
Linzumi launches shared chat for fleets of AI coding agents
YC-backed Linzumi launched a team chat and agent orchestration environment where humans and AI coding agents share threads, with Sean Grove describing a future of 10,000 agent hours per person per day.
10,000 agent hours / person / day$100 flat monthly team tier
Sakana AI launches Fugu multi-agent orchestration API
Sakana AI launched Fugu, a single API endpoint that coordinates publicly accessible models behind Thinker, Worker, and Verifier roles to match or beat frontier systems on multiple benchmarks.
95.5 GPQA Diamond93.2 LiveCodeBench73.7 SWE-Bench Pro
HumanLayer launches an Agentic IDE to fight AI code slop
HumanLayer launched its Agentic IDE, positioned as a human-in-the-loop answer to lights-out coding-agent slop. Dexter Horthy joined the show to argue that the right architecture keeps humans steering high-impact changes instead of letting agents silently trash production codebases.
Weights & Biases launches HiveMind for coding-agent observability
Weights & Biases launched HiveMind, a dashboard for tracking AI coding-agent sessions, spend, transcripts, ROI, and reusable organizational learning. Chris Van Pelt and Adrian Swanberg joined the show to explain why teams need observability for their growing fleet of coding agents.
Arena launches Agent Arena for real-world agent workflow evals
Arena (LMArena) launched Agent Arena during the episode, moving beyond one-turn chatbot preference battles to evaluate models on real agent workflows with web search, files, terminals, user corrections, and objective recovery signals. Peter Gostev joined live to explain why long-running, harder tasks need a different benchmark.
The show covered a reported $60B all-stock acquisition of Anysphere/Cursor by SpaceX/xAI. Alex framed it as coding assistants becoming strategic infrastructure: workflows, agent traces, and developer context are now assets frontier labs want to own.
Anthropic disables Fable and Mythos access after US government restriction
Anthropic reportedly shut down Fable 5 and Mythos 5 access for foreign nationals, then disabled both models broadly to comply. The episode framed it as the first major direct government intervention in frontier model access, turning model availability into a national-security and sovereign-AI story.