Local & On-Device AI

Small models and local inference on phones, laptops, and edge hardware. — 47 releases covered on the show.

July 2026

E Exo Labs Jul 2, 2026

Products & Apps

local.ai

Exo Labs launches local.ai to track the local-AI frontier

Announced live on ThursdAI at AI Engineer: local.ai tracks the best model for your hardware, the performance trade versus the cloud, and whether running local beats API-token pricing. Early access is live with signup codes, and the Exo CLI — 'vLLM for consumer devices, with the configs figured out for you' — ships in the coming weeks.

71% Terminal Bench 2.1, REAP-pruned GLM 5.2550B Nemotron-3 Ultra running on 4 NVIDIA Sparks

🎙️ Hear our coverage →

#on-device #open-source #infrastructure

June 2026

Google DeepMind Jun 4, 2026

New ModelsOpen weights

Gemma 4 12B

Google drops Gemma 4 12B, an encoder-free multimodal local model

Google released Gemma 4 12B, an encoder-free multimodal model under Apache 2.0 that targets 16GB VRAM local setups. Instead of bolting separate vision or audio encoders onto a language model, it uses one unified network, which LDJ and Yam argued makes smaller multimodal models cheaper, cleaner, and easier to run locally.

X announcement ↗Hugging Face ↗

🎙️ Hear our coverage →

#open-source #multimodal #on-device

NVIDIA Jun 4, 2026

Products & Apps

RTX Spark

NVIDIA announces RTX Spark Arm + Blackwell platform for local AI PCs

At Computex, NVIDIA unveiled RTX Spark, an Arm CPU plus Blackwell GPU PC platform with 128GB unified memory targeting local AI agents and 120B-class local inference. A wave of thin laptops with RTX 5070-class GPUs and roughly one petaflop of local AI compute raises the question of what agents should run locally versus in the cloud.

Coverage (Tom's Hardware) ↗

🎙️ Hear our coverage →

#infrastructure #on-device #agents

May 2026

OpenBMB May 28, 2026

New ModelsOpen weights

MiniCPM5-1B

OpenBMB MiniCPM5-1B: new SOTA 1B open-weights model

OpenBMB released MiniCPM5-1B, a state-of-the-art 1B-parameter open-weights model for efficient local and on-device use that runs on a phone. It scores 17.9 on the Artificial Analysis Intelligence Index, 7.4 points ahead of its size class, while using roughly 31x fewer output tokens than Qwen3.5 2B.

17.9 AAII (1B model)

OpenBMB MiniCPM5-1B on Hugging Face ↗MiniCPM5-1B paper ↗Artificial Analysis on MiniCPM5-1B ↗OpenBMB announcement ↗

🎙️ Hear our coverage →

#open-source #on-device

P PrismML May 28, 2026

New ModelsOpen weights

Bonsai Image 4B

PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB

PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.

PrismML Bonsai Image 4B — blog ↗PrismML Bonsai on Hugging Face ↗Bonsai Image demo ↗Bonsai Studio iOS app ↗

🎙️ Hear our coverage →

#image-gen #on-device #infrastructure

April 2026

NVIDIA Apr 30, 2026

New ModelsOpen weights

Nemotron 3 Nano Omni

NVIDIA Nemotron 3 Nano Omni: hybrid Transformer-Mamba MoE

NVIDIA released Nemotron 3 Nano Omni, a 30B-total/3B-active hybrid Transformer-Mamba MoE with 256K context. It delivers 9x throughput on consumer hardware.

NVIDIA blog ↗

🎙️ Hear our coverage →

#open-source #multimodal #architecture

0 0xSero Apr 16, 2026

New ModelsOpen weights

Gemma 4 21B REAP

Gemma 4 21B REAP: 20% expert-pruned Gemma 4 26B MoE

Community researcher 0xSero released Gemma 4 21B-A4B REAP, a 20% expert-pruned version of the Gemma 4 26B MoE created using Cerebras' REAP pruning technique. It shrinks the model for cheaper local inference while preserving most of its quality.

gemma-4-21b-a4b-it-REAP on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #architecture #on-device

J Jiunsong (@songjunkr) Apr 16, 2026

New ModelsOpen weights

Super Gemma 4 26B Uncensored v2

Super Gemma 4 26B Uncensored v2 trends on HF with 0/100 refusals

Community fine-tuner @songjunkr released Super Gemma 4 26B Uncensored v2, which is trending on Hugging Face with 0/100 refusals and fixed tool calling. It ships in GGUF and MLX 4-bit variants for local inference.

Super Gemma 4 26B Uncensored GGUF v2 (HF) ↗Super Gemma 4 26B Uncensored MLX 4bit v2 (HF) ↗@songjunkr on X ↗

🎙️ Hear our coverage →

#open-source #on-device

Google DeepMind Apr 2, 2026

New ModelsOpen weights

Gemma 4

Google releases Gemma 4 open-weights family under Apache 2.0

Google DeepMind's Gemma 4 launch crossed 10M+ downloads with over 1,000 Gemma-4-based fine-tunes on Hugging Face; the Gemma family totals 500M+ downloads. Omar Sanseviero says Gemma is the foundation for the next generation of Gemini Nano shipping on Pixel and Samsung, with the AI Edge gallery letting people run it locally on Android and iOS. It punched above its size on Arena's Pareto curve and is now live on W&B Inference.

Hugging Face Collection ↗Try in AI Studio ↗Omar Sanseviero on X ↗

🎙️ Hear our coverage (+1 follow-up) →

#open-source #agents #on-device

Liquid AI Apr 2, 2026

New ModelsOpen weights

LFM2.5-350M

Liquid AI ships LFM2.5-350M with agentic tool calling at 350M params

Liquid AI released LFM2.5-350M, a 350M-parameter open model that does agentic tool calling and fits under 500MB quantized. It targets edge and on-device agent workloads where tiny deployable models matter.

Announcement (X) ↗Hugging Face ↗Liquid AI blog ↗

🎙️ Hear our coverage →

#open-source #on-device #agents

P PrismML Apr 2, 2026

New ModelsOpen weights

Bonsai

PrismML releases Bonsai 1-bit models, an 8B model in 1.15 GB

PrismML released Bonsai, a family of 1-bit quantized open models fitting an 8B model into 1.15 GB and claiming 10x intelligence density, built on decades of compression research. The panel discussed one-bit quantization as a cost/performance lever for cheap local inference.

Announcement (X) ↗Hugging Face ↗PrismML site ↗

🎙️ Hear our coverage →

#open-source #infrastructure #on-device

March 2026

Reka AI Mar 26, 2026

New ModelsOpen weights

Reka Edge

Reka AI ships Edge, a 7B multimodal VLM for sub-second on-device inference

Reka AI launched Reka Edge, a 7B-parameter multimodal vision-language model built for sub-second latency on edge devices. Weights are on Hugging Face and the model is available through OpenRouter, with the panel highlighting it as a notable efficient multimodal release for real-world deployment.

Reka AI announcement (X) ↗Reka Edge on Hugging Face ↗Reka Edge on OpenRouter ↗Reka AI blog ↗

🎙️ Hear our coverage →

#open-source #vision #on-device

Unsloth AI Mar 19, 2026

Dev ToolsOpen weights

Unsloth Studio

Unsloth Studio: web UI for local fine-tuning with 2x speed, 70% less VRAM

Unsloth launched Studio, an open-source web UI for local LLM training and inference claiming 2x speed and 70% less VRAM, supporting 500+ models across text, vision, audio, and embeddings. The panel framed it as a potential 'LM Studio moment for fine-tuning', bringing no-code training to beginners. Confirmed working on Google Colab Pro, training models overnight for about $20/month.

Unsloth Studio docs ↗X announcement ↗GitHub ↗Daniel Han announcement (X) ↗

🎙️ Hear our coverage (+1 follow-up) →

#training #open-source #coding

Alibaba (Qwen) Mar 5, 2026

New ModelsOpen weights

Qwen3.5 Small Series

Alibaba releases Qwen3.5 small models (2B, 4B, 9B) for local use

Alibaba released the Qwen3.5 small model series with 2B, 4B, and 9B variants, which the panel found highly usable on consumer hardware. The release landed alongside leadership turbulence as Junyang Lin and Binyuan Hui departed Qwen, though the panel expects Alibaba's open-source momentum to continue.

Qwen3.5 small models announcement ↗Qwen3.5-9B on Hugging Face ↗Qwen3.5-4B on Hugging Face ↗Qwen3.5-2B on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #on-device

February 2026

Liquid AI Feb 26, 2026

New ModelsOpen weights

LFM2-24B-A2B

Liquid AI releases LFM2-24B-A2B, a laptop-friendly 24B MoE

Liquid AI released LFM2-24B-A2B, a 24B mixture-of-experts model with only 2.3B active parameters that runs on consumer laptops. The panel highlighted its speed and surprisingly strong non-coding reasoning, reinforcing the trend of efficient low-active-parameter open models for local use.

Liquid AI announcement on X ↗LFM2-24B-A2B on Hugging Face ↗Liquid AI blog post ↗

🎙️ Hear our coverage →

#open-source #architecture #on-device

LM Studio Feb 26, 2026

Dev Tools

LMLink

LM Studio launches LMLink for remote access to local models

LM Studio launched LMLink, which lets you use your locally hosted models from anywhere via Tailscale. It extends the local-model story so that on-device inference is reachable from any of your machines.

LMLink page ↗

🎙️ Hear our coverage →

#on-device #coding

Cohere Labs Feb 19, 2026

New ModelsOpen weights

Tiny Aya

Cohere Labs releases Tiny Aya, a 3.35B multilingual model for 70+ languages

Cohere Labs released Tiny Aya, a 3.35B-parameter multilingual model family supporting 70+ languages that is small enough to run locally on phones. It extends Cohere's Aya line of open multilingual models, bringing broad language coverage to on-device deployments.

Tiny Aya announcement (X) ↗Tiny Aya collection on Hugging Face ↗Tiny Aya Global on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #multilingual #on-device

January 2026

Jan AI Jan 29, 2026

New ModelsOpen weights

Jan v3

Jan AI releases Jan v3, a 4B model built for fast local inference

Jan v3 is a 4B-parameter open model optimized for local inference, hitting 132 tokens/sec with a 262K context window and a 40% improvement on coding. The Jan desktop app it powers has reached 5M downloads.

4B Jan v3 parameters

Announcement (X) ↗Hugging Face ↗Hugging Face (GGUF) ↗Jan.ai ↗

🎙️ Hear our coverage →

#open-source #on-device #coding

Liquid AI Jan 22, 2026

New ModelsOpen weights

LFM2.5-1.2B-Thinking

Liquid AI's LFM2.5-1.2B-Thinking: on-device reasoning under 900MB

Liquid AI released LFM2.5-1.2B-Thinking, a 1.2B parameter reasoning model that runs entirely on-device with under 900MB of memory. Its hybrid architecture with gated convolutions delivers 239 tokens/sec on an AMD CPU and 82 tokens/sec on a mobile NPU, making it practical for edge devices, Raspberry Pi, and older iPhones.

1.2B Parameters, under 900MB memory

LFM2.5-1.2B-Thinking announcement (X) ↗LFM2.5-1.2B-Thinking on Hugging Face ↗LFM2.5-1.2B-Thinking on Liquid LEAP ↗

🎙️ Hear our coverage →

#open-source #reasoning #on-device

Liquid AI Jan 8, 2026

New ModelsOpen weights

LFM 2.5

Liquid AI LFM 2.5: 1B on-device family with end-to-end audio

Liquid AI released LFM 2.5, a family of ~1.2B parameter on-device models spanning text, vision, and audio, announced at CES alongside AMD's Lisa Su. The models hit 239 tokens/sec on AMD CPU and 100 tokens/sec on iPhone 16 Pro Max, and include a revolutionary end-to-end audio model that skips the traditional ASR-LLM-TTS pipeline entirely, running in as little as 8GB of RAM.

Liquid AI LFM 2.5 on X ↗LFM 2.5 on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #on-device #voice-ai

December 2025

NVIDIA Dec 25, 2025

Products & Apps

Project Digits

NVIDIA Project Digits: $3,000 desktop that runs 200B-param models

NVIDIA announced Project Digits in January, a $3,000 desktop supercomputer capable of running 200B parameter models locally. It brought serious local-inference hardware to individual developers and was one of January's standout hardware stories.

Jan 10 Episode ↗

🎙️ Hear our coverage →

#infrastructure #on-device

Google DeepMind Dec 18, 2025

New ModelsOpen weights

FunctionGemma

FunctionGemma: Google's 270M function-calling model for edge agents

Google released FunctionGemma, a tiny 270M-parameter open model specialized for function calling on-device. With a roughly 500MB RAM footprint and strong gains after fine-tuning for mobile actions, it points toward privacy-first local agents on constrained hardware.

FunctionGemma docs ↗FunctionGemma blog ↗FunctionGemma announcement on X ↗

🎙️ Hear our coverage →

#on-device #agents #open-source

Mistral AI Dec 4, 2025

New ModelsOpen weights

Mistral 3 (Large 3 + Ministral 3)

Mistral returns to Apache 2.0 with Mistral Large 3 and Ministral 3

Mistral relaunched its model family under permissive Apache 2.0 licensing with Mistral Large 3 and the small Ministral 3 edge models. Large 3 ships a 256K context window and strong open-model coding positioning. The licensing shift reignited discussion around open model portability and deployability.

256K Mistral Large 3 context window

Mistral 3 blog ↗Mistral Large 3 (Hugging Face collection) ↗Ministral 3 (Hugging Face collection) ↗Mistral announcement on X ↗

🎙️ Hear our coverage →

#open-source #on-device #coding

November 2025

Microsoft Nov 27, 2025

New ModelsOpen weights

Fara-7B

Microsoft ships Fara-7B, a 7B on-device computer use agent

Microsoft Research released Fara-7B, a best-in-class 7B-parameter vision-language model for computer use that runs on-device. It scores 73.5% on WebVoyager, beating OpenAI's computer-use preview while being small enough to run locally.

73.5% WebVoyager

Fara-7B on HuggingFace ↗Fara-7B Blog ↗Fara-7B Announcement on X ↗Fara on GitHub ↗

🎙️ Hear our coverage →

#open-source #agents #on-device

Tencent (Hunyuan) Nov 27, 2025

New ModelsOpen weights

HunyuanOCR

Tencent's 1B HunyuanOCR beats 72B models on OCRBench

Tencent released HunyuanOCR, a 1B-parameter OCR model that scores 860 on OCRBench, beating models as large as Qwen3-VL-72B. It is a striking example of task-specialized small models outperforming generalist giants.

1B Parameters860 OCRBench score

HunyuanOCR on HuggingFace ↗HunyuanOCR on GitHub ↗HunyuanOCR Announcement on X ↗Hunyuan Vision Blog ↗

🎙️ Hear our coverage →

#vision #open-source #on-device

W WeiboAI Nov 13, 2025

New ModelsOpen weights

VibeThinker-1.5B

WeiboAI releases VibeThinker-1.5B open reasoning model

Weibo's AI team open-sourced VibeThinker-1.5B, a tiny reasoning model that reportedly outperforms much larger models like DeepSeek R1 on select reasoning benchmarks. Part of a week where small open-weights models from Chinese labs kept punching above their weight.

WeiboLLM announcement on X ↗Hugging Face model page ↗Arxiv paper ↗VentureBeat coverage ↗

🎙️ Hear our coverage →

#open-source #reasoning #on-device

October 2025

IBM Oct 30, 2025

New ModelsOpen weights

Granite 4.0 Nano

IBM Granite 4.0 Nano: ultra-efficient tiny models for edge deployment

IBM released Granite 4.0 Nano, a set of ultra-efficient tiny open models aimed at edge deployment. The release continues the trend of capable sub-billion-to-few-billion parameter models that can run locally on constrained hardware.

Artificial Analysis on X ↗Artificial Analysis: Granite ↗

🎙️ Hear our coverage →

#open-source #on-device

Liquid AI Oct 23, 2025

New ModelsOpen weights

LFM2-VL-3B

Liquid AI ships LFM2-VL-3B tiny multilingual vision-language model

Liquid AI released LFM2-VL-3B, a tiny multilingual vision-language model, part of a wave of OCR-and-VLM releases this week. It targets efficient on-device and edge vision-language workloads at the 3B scale.

🎙️ Hear our coverage →

#vision #open-source #on-device

Apple Oct 16, 2025

Products & Apps

M5 chip

Apple announces M5 chip with double the AI performance

Apple unveiled the M5 chip, claiming roughly double the AI performance of the previous generation for Apple Silicon. For local-model enthusiasts on the show, it means more on-device headroom for running and fine-tuning models on Macs.

Apple Newsroom ↗

🎙️ Hear our coverage →

#infrastructure #on-device

NVIDIA Oct 16, 2025

Products & Apps

DGX Spark

NVIDIA DGX Spark: a desktop personal supercomputer for local AI

NVIDIA started shipping DGX Spark, a desktop personal AI supercomputer aimed at prototyping and local inference. The show pointed to the LMSYS deep dive on its real-world performance, and Alex shared his own first impressions of the device.

LMSYS Blog deep dive ↗Alex's impressions on X ↗

🎙️ Hear our coverage →

#infrastructure #on-device

September 2025

IBM Sep 25, 2025

New ModelsOpen weights

Granite Docling 258M

IBM releases Granite Docling 258M compact document-parsing VLM

IBM published Granite Docling 258M, an ultra-compact open-source vision-language model for document understanding that converts documents into structured output. At just 258M parameters it reinforced the show's point that tiny specialized models are becoming genuinely useful workflow tools.

🎙️ Hear our coverage →

#vision #on-device #open-source

Liquid AI Sep 25, 2025

New ModelsOpen weights

Liquid Nanos

Liquid AI ships Liquid Nanos, tiny task-specific on-device models

Liquid AI released Liquid Nanos, a family of very small task-specific models built for jobs like extraction, translation, RAG, and tool calling that can run on-device. The collection landed on Hugging Face, fitting the episode's theme of small-but-capable models powering real products.

🎙️ Hear our coverage →

#open-source #on-device

Moondream AI Sep 25, 2025

New ModelsOpen weights

Moondream 3

Moondream 3 preview punches above its weight in the tiny-VLM race

Moondream released a preview of Moondream 3, a small open vision-language model that punches well above its size class. CTO and co-founder Vik Korrapati joined the show to explain why small, capable vision models matter for real product building, framing Moondream 3 as a practical tool rather than a benchmark flex.

🎙️ Hear our coverage →

#vision #on-device #open-source

Apple Sep 4, 2025

New ModelsOpen weights

FastVLM-7B

Apple's FastVLM-7B lands with a speed-first vision encoder, 85x faster TTFT

Apple released FastVLM-7B, a vision-language model built around a speed-first vision encoder that delivers up to 85x faster time-to-first-token than peer VLMs. Quantized variants (7B-int4, 1.5B-int8) on Hugging Face make it practical for on-device and real-time vision use, anchoring the show's fast-VLM discussion.

X ↗HF ↗HF (1.5B int8) ↗

🎙️ Hear our coverage →

#vision #on-device #open-source

Google DeepMind Sep 4, 2025

New ModelsOpen weights

EmbeddingGemma

Google releases EmbeddingGemma, a 300M-param SOTA embedding model for RAG

Google released EmbeddingGemma, a 300M-parameter open embedding model that achieves state-of-the-art results for its size, aimed at RAG and on-device semantic search. It dropped as breaking news during the show, with browser-based demos like Semantic Galaxy showing it running fully client-side.

X ↗HF ↗Try It ↗

🎙️ Hear our coverage →

#search #open-source #on-device

May 2025

Stability AI May 15, 2025

New ModelsOpen weights

Stable Audio Open Small

Stability AI and Arm release Stable Audio Open Small for on-device audio

Stability AI, together with Arm, released Stable Audio Open Small, a 341M-parameter open text-to-audio model built for real-world on-device deployment. The show framed it as part of a small comeback for Stability, with weights on Hugging Face and an accompanying paper.

Blog ↗Paper ↗Hugging Face ↗Announcement on X ↗

🎙️ Hear our coverage →

#audio #on-device #open-source

Technology Innovation Institute (TII) May 15, 2025

New ModelsOpen weights

Falcon-Edge

Falcon-Edge: ternary BitNet LLMs for edge deployment under 1GB VRAM

TII's Falcon-Edge project releases ternary BitNet LLMs (1B and 3B base models) that slash memory and compute requirements, enabling inference on less than 1GB of VRAM. Fine-tuners get pre-quantized checkpoints and a clear path to 1-bit LLMs.

Blog ↗Falcon-E-1B on Hugging Face ↗Falcon-E-3B on Hugging Face ↗

🎙️ Hear our coverage →

#open-source #on-device #infrastructure

Kyutai May 1, 2025

New ModelsOpen weights

Helium-1

Kyutai releases Helium-1, a 2B European-language model plus dactory pipeline

Kyutai released Helium-1, a 2B-parameter model distilled from Gemma-2-9B and purpose-built for Europe's 24 official languages, under CC-BY 4.0. It sets a new state of the art for its size class on MMLU-EU, ARC-EU and FLORES translation while fitting in under 2GB VRAM for edge and phone deployment. They also open-sourced 'dactory' (MIT), their full Common Crawl data-processing pipeline that scores, dedups and tags webpages.

Blog post ↗Hugging Face: helium-1-2b ↗Dactory pipeline (GitHub) ↗

🎙️ Hear our coverage →

#open-source #multilingual #on-device

Microsoft May 1, 2025

New ModelsOpen weights

Phi-4-reasoning

Microsoft ships Phi-4-reasoning and Phi-4-reasoning-plus (14B, MIT)

Microsoft fine-tuned the 14B Phi-4 on 1.4M curated chain-of-thought traces (SFT) and added a small RL stage (Plus variant) to create two MIT-licensed reasoning models. They punch far above their weight: Phi-4-reasoning-plus outperforms DeepSeek-R1-Distill-70B on AIME 25 (78% vs 51%) and sits within a few points of the full 671B DeepSeek-R1, while running on a single GPU with explicit <think> scaffolding.

ArXiv paper ↗Tech report ↗Hugging Face: Phi-4-reasoning ↗Suriya's thread ↗

🎙️ Hear our coverage →

#open-source #reasoning #on-device

April 2025

Google DeepMind Apr 24, 2025

New ModelsOpen weights

Gemma 3 QAT

Google ships Quantization-Aware Trained Gemma 3 models for consumer GPUs

Google released Quantization-Aware Training (QAT) versions of the Gemma 3 family, dramatically cutting memory requirements while preserving quality. The 27B model drops from a hefty 54GB to just 14.1GB, and even the 1B model goes from 2GB to about half a gig, making state-of-the-art open models runnable on consumer GPUs. Wolfram took the 4B QAT model for a spin in LM Studio on the show.

27B Gemma 3 27B QAT: 54GB down to 14.1GB1B Gemma 3 1B QAT: 2GB down to ~0.5GB4B 4B QAT model tested in LM Studio

X Post ↗Blog ↗Reddit thread ↗

🎙️ Hear our coverage →

#open-source #infrastructure #on-device

L Lvmin Zhang (lllyasviel) Apr 24, 2025

New ModelsOpen weights

FramePack

FramePack generates 120-second videos on just 6GB of VRAM

FramePack, from ControlNet creator Lvmin Zhang (lllyasviel), is an open source next-frame prediction approach for long video generation that runs on consumer hardware. It can generate videos up to 120 seconds long on as little as 6GB of VRAM by packing input frame context into a fixed length.

120s Max video length6GB Minimum VRAM

Project Page ↗GitHub ↗

🎙️ Hear our coverage →

#video-gen #open-source #on-device

Mistral AI Apr 17, 2025

Products & Apps

Classifiers Factory

Mistral releases Classifiers Factory

Mistral announced Classifiers Factory, a service for building and training custom text classifiers on its platform. Covered as a quick item in the Big CO LLMs + APIs section of the show.

🎙️ Hear our coverage →

#on-device #api

March 2025

M MLX Community (Prince Canuma) Mar 27, 2025

Dev ToolsOpen weights

MLX-Audio v0.0.3

Prince Canuma releases MLX-Audio v0.0.3 for speech on Apple Silicon

Prince Canuma, creator of MLX-VLM, FastMLX, and MLX Embeddings, released MLX-Audio v0.0.3, an open-source library bringing speech and audio models to Apple Silicon via MLX. It makes powerful open-source TTS and audio models accessible locally on Mac hardware.

GitHub repo ↗Prince Canuma on X ↗

🎙️ Hear our coverage →

#voice-ai #open-source #on-device

Google DeepMind Mar 13, 2025

New ModelsOpen weights

Gemma 3

Google open sources Gemma 3, 1B-27B multimodal family with 128K context

Google released Gemma 3, an open-weights model family spanning 1B to 27B parameters with multimodal (text, image, video) capabilities, support for over 140 languages, and a 128K context window. The 27B model runs on a single GPU, with Sundar Pichai claiming competitors need roughly 10x the compute for similar performance. It shipped with day-one open source ecosystem support (Hugging Face, Ollama, Kaggle) plus ShieldGemma 2 for content moderation.

Blog ↗AI Studio ↗HF Collection ↗Hugging Face (27B) ↗

🎙️ Hear our coverage →

#open-source #multimodal #on-device

February 2025

Microsoft Feb 27, 2025

New ModelsOpen weights

Phi-4-multimodal

Microsoft releases Phi-4-multimodal and Phi-4-mini open weights

Microsoft expanded the Phi family with Phi-4-multimodal-instruct, a small open-weights model that handles text, vision, and audio in a single model, alongside a compact Phi-4-mini. The weights shipped on Hugging Face, continuing Microsoft's push for capable small models that can run on-device.

Blog ↗HuggingFace ↗

🎙️ Hear our coverage →

#open-source #on-device #multimodal

January 2025

Mistral AI Jan 30, 2025

New ModelsOpen weights

Mistral Small 2501

Mistral Small 2501: 24B open-weights model under Apache 2.0

Mistral AI released Mistral Small 2501, a 24B-parameter instruct model under the permissive Apache 2.0 license. Announced as breaking news during the show, it continues Mistral's tradition of strong small open models suitable for fine-tuning and local deployment.

24B Parameters

Hugging Face ↗

🎙️ Hear our coverage →

#open-source #on-device

Hugging Face Jan 23, 2025

New ModelsOpen weights

SmolVLM (256M)

Hugging Face SmolVLM: tiny vision-language models run on WebGPU

Hugging Face released SmolVLM, a family of tiny vision-language models including a 256M-parameter version small enough to run entirely in the browser via WebGPU. It demonstrates how far efficient multimodal models have shrunk while remaining usable.

256M Parameters (smallest VLM)

SmolVLM-256M WebGPU demo on Hugging Face ↗

🎙️ Hear our coverage →

#vision #open-source #on-device