NVIDIA

24 releases covered on ThursdAI · nvidia.com ↗

June 2026

NVIDIA
New ModelsOpen weights

Nemotron 3.5 ASR

NVIDIA ships Nemotron 3.5 ASR, a 600M streaming speech model

NVIDIA released Nemotron 3.5 ASR, a 600M-parameter open multilingual streaming speech-to-text model aimed at voice agents. It supports 40 languages and reportedly delivers 17x more throughput than Parakeet-style baselines at half the size, pushing the latency/accuracy frontier for open voice-agent infrastructure.

17x Nemotron ASR throughput
NVIDIA
New ModelsOpen weights

Nemotron 3 Ultra

NVIDIA releases Nemotron 3 Ultra, a 550B open-weight MoE for agents

NVIDIA dropped Nemotron 3 Ultra the day of the show, a 550B-parameter sparse MoE with 55B active parameters built for long-running agentic harnesses like OpenCode, Hermes, and OpenClaw. Chris Alexiuk joined to explain the hybrid Mamba/Transformer architecture and the unusually complete open release: weights, training data, recipes, a GenRM reward model, and an NVFP4 quantized checkpoint.

550B Nemotron 3 Ultra parameters55B Active parameters
NVIDIA
Products & Apps

RTX Spark

NVIDIA announces RTX Spark Arm + Blackwell platform for local AI PCs

At Computex, NVIDIA unveiled RTX Spark, an Arm CPU plus Blackwell GPU PC platform with 128GB unified memory targeting local AI agents and 120B-class local inference. A wave of thin laptops with RTX 5070-class GPUs and roughly one petaflop of local AI compute raises the question of what agents should run locally versus in the cloud.

April 2026

March 2026

NVIDIA
Also Released

GR LPX (Rubin NVL72 + Groq 3)

NVIDIA GTC: GR LPX pairs Rubin NVL72 servers with the new Groq 3 chip

NVIDIA's GTC hardware reveal integrates the new Groq 3 chip (gen 2 was never publicly seen) into Rubin NVL72 servers via the GR LPX system. Claims include 3x tokens-per-watt efficiency at baseline, up to 30x at higher throughput, and 1000+ tokens/sec on a 2T-parameter frontier model with 400K context — performance the current Blackwell generation can't reach at any price.

NVIDIA
Products & Apps

NemoClaw

NVIDIA announces NemoClaw, enterprise-hardened OpenClaw, at GTC

At GTC, Jensen Huang spent 15 minutes on OpenClaw, calling it the most important open source release since Linux and declaring 'every company needs an OpenClaw strategy.' NVIDIA released NemoClaw, a hardened enterprise reference implementation of OpenClaw with a privacy router and policy engine aimed at solving the agent security problem.

NVIDIA
New ModelsOpen weights

Nemotron 3 Super 120B

NVIDIA releases Nemotron 3 Super 120B with $26B open-source bet

NVIDIA launched Nemotron 3 Super, a 120B Hybrid Mamba-Transformer MoE model with 12B active parameters, a 1M-token context window, and 450 tok/s throughput. It shipped with BF16/FP8/NVFP4 weights, a base checkpoint, SFT and pre-training data, and the full training recipe, alongside a $26B 5-year open-source commitment. It is available on W&B Inference at $0.20/M input and $0.80/M output.

120B Nemotron 3 Super total parameters12B Nemotron 3 Super active parameters (MoE)1M Nemotron 3 Super context window (tokens)

January 2026

NVIDIA
New ModelsOpen weights

Alpha Mayo

NVIDIA Alpha Mayo: open source reasoning self-driving models

NVIDIA announced Alpha Mayo at CES, a family of open source reasoning-based self-driving AI models. The models perform end-to-end autonomous driving with explicit reasoning steps, like identifying jaywalkers and stopping accordingly, demoed in a Mercedes-Benz.

NVIDIA
Acquisitions

Groq acquisition

NVIDIA acquires Groq team and licenses its tech for ~$20B

NVIDIA entered an exclusive licensing deal with Groq and acquired most of its team for approximately $20B. Groq's inference-optimized chips, created by former Google TPU lead Jonathan Ross, complement NVIDIA's training dominance as inference demand grows exponentially across AI use cases.

NVIDIA
New ModelsOpen weights

Nemotron Speech ASR

Nemotron Speech ASR: 600M streaming model with 24ms latency

NVIDIA released Nemotron Speech ASR, a 600M parameter open source streaming speech recognition model with 24ms median latency and support for 900 concurrent streams on a single H100. Kwindla Hultman Kramer of Daily.co demoed sub-500ms voice-to-voice latency using a three-model pipeline of Nemotron ASR, Nemotron Nano LLM, and Magpie TTS.

24ms Nemotron Speech latency
NVIDIA
Products & Apps

Vera Rubin

NVIDIA Vera Rubin platform: 5x Blackwell inference at CES 2026

Jensen Huang unveiled the Vera Rubin platform at CES 2026, NVIDIA's next-gen AI computer delivering 50 PFLOPS and 5x inference performance over Blackwell while adding only ~200W of power draw. It needs 75% fewer GPUs for 10 trillion parameter MoE training, packs 72 GPUs per rack with 20.7TB memory and 13 TB/s bandwidth, is 100% liquid cooled, and entered full production just four months after the B300.

5x Vera Rubin vs Blackwell75% Fewer GPUs needed

December 2025

NVIDIA
Products & Apps

Project Digits

NVIDIA Project Digits: $3,000 desktop that runs 200B-param models

NVIDIA announced Project Digits in January, a $3,000 desktop supercomputer capable of running 200B parameter models locally. It brought serious local-inference hardware to individual developers and was one of January's standout hardware stories.

NVIDIA
New ModelsOpen weights

Nemotron 3 Nano

NVIDIA ships Nemotron 3 Nano, a 30B hybrid Mamba-MoE with full recipes

NVIDIA released Nemotron 3 Nano, a 30B-parameter hybrid Mamba-MoE model with only 3B active parameters for efficient inference. The panel called it the most consequential open release of the week because NVIDIA shipped not just weights but technical reports, training recipes, and details on the 25T-token training data.

30B (3B active) Nemotron 3 Nano parameters

November 2025

October 2025

September 2025

NVIDIA
Funding

NVIDIA-OpenAI $100B partnership

Nvidia commits up to $100B to OpenAI for 10GW of compute

Nvidia and OpenAI announced a letter of intent under which Nvidia would invest up to $100 billion in OpenAI as the two deploy at least 10 gigawatts of Nvidia systems for OpenAI's next-generation infrastructure. The episode's big-company segment centered on this deal as evidence that money and infrastructure, not just models, now drive the AI race.

April 2025

NVIDIA
New ModelsOpen weights

Describe Anything (DAM-3B)

NVIDIA releases DAM-3B for region-based image and video captioning

NVIDIA dropped the Describe Anything Model (DAM-3B), a 3 billion parameter multimodal model for region-based image and video captioning. You can point it at a specific region of an image or video and it generates a detailed description of just that area. NVIDIA also published an accompanying DescribeAnything dataset and a Hugging Face demo.

3B Parameters
NVIDIA
New ModelsOpen weights

Llama-3.1-Nemotron-Ultra-253B

NVIDIA ships Nemotron Ultra, a 253B pruned and distilled Llama 3.1-405B

NVIDIA released Nemotron Ultra, a pruned and distilled finetune of Llama 3.1-405B at roughly half the parameters (253B). Its benchmarks even included Llama 4 comparisons, showing the older finetuned Llama beating the new models on AIME, GPQA and more. It supports 128K context and fits on a single 8xH100 node for inference.

253B Parameters (pruned from Llama 3.1-405B)128K Context window

March 2025

NVIDIA
New ModelsOpen weights

Canary 1B/180M Flash

NVIDIA Canary Flash: Apache 2 speech recognition and translation

NVIDIA released Canary 1B Flash and 180M Flash, Apache 2.0 licensed speech recognition and translation models built as Llama finetunes. The permissive license makes them freely usable for commercial ASR and translation workloads.

NVIDIA
New ModelsOpen weights

Llama-Nemotron (Super 49B, Nano 8B)

NVIDIA drops Llama-Nemotron reasoning models plus training dataset

NVIDIA released the Llama-Nemotron family, including Super 49B and Nano 8B reasoning models, announced around GTC. Alongside the open weights, NVIDIA published the Llama-Nemotron post-training dataset, giving the community both the models and the data recipe behind them.

January 2025