New ModelsOpen weights
Qwen2.5-Omni-7B
Qwen launches Omni 7B: sees, hears, reads, and talks back
Qwen released Qwen2.5-Omni-7B, an open-weights omni-modal model that perceives text, images, audio, and video, and generates both text and speech. It packs end-to-end multimodal perception and spoken output into a 7B parameter model available on Hugging Face.
7B parameters
Major Features & Updates
GPT-4o Native Image Generation
OpenAI enables native image generation in GPT-4o, internet goes Ghibli
OpenAI finally enabled GPT-4o's native auto-regressive image generation in ChatGPT, sparking the biggest mainstream AI buzz of the week as the internet ghiblified itself. Launched right after Gemini 2.5, it excels at instruction following, text rendering, and multi-turn editing, with viral demos ranging from ad mockups to a full Lord of the Rings trailer.
New ModelsOpen weights
Mistral Small 3.1
Mistral Small 3.1 24B: open-weights multimodal model
Mistral released Mistral Small 3.1, a 24B-parameter open-weights model that adds multimodal (vision) capabilities to the Small line. Both instruct and base checkpoints were published on Hugging Face, making it a strong local multimodal option at the 24B size class.
Major Features & Updates
Google AI Studio YouTube link understanding
Google AI Studio adds native YouTube video understanding via link dropping
Google AI Studio now lets you drop a YouTube link and have Gemini natively understand the video. This unlocks video analysis, summarization, and support use cases without downloading or preprocessing the content.
Major Features & Updates
Gemini 2.0 Flash native image generation
Gemini Flash gains native image generation and conversational editing
Google enabled native image generation in Gemini Flash Experimental, letting users generate and iteratively edit images conversationally inside the same multimodal model. The crew demoed it live on stream, editing photos of themselves with natural-language instructions, and saw it as a preview of how creative tools like Photoshop will work.
New ModelsOpen weights
Gemma 3
Google open sources Gemma 3, 1B-27B multimodal family with 128K context
Google released Gemma 3, an open-weights model family spanning 1B to 27B parameters with multimodal (text, image, video) capabilities, support for over 140 languages, and a 128K context window. The 27B model runs on a single GPU, with Sundar Pichai claiming competitors need roughly 10x the compute for similar performance. It shipped with day-one open source ecosystem support (Hugging Face, Ollama, Kaggle) plus ShieldGemma 2 for content moderation.