Image Generation

Image generation and editing models and creative visual tools. — 63 releases covered on the show.

July 2026

Reve Jul 9, 2026

New Models

Reve 2.1

Reve 2.1 takes #2 on the Text-to-Image Arena with layer-based generation

Released a month after Reve 2.0 (and mid-way through the ThursdAI live show), Reve 2.1 landed at #2 on the Text-to-Image Arena with a score of 1306, 28 points clear of the field, dethroning Meta's Muse Image after roughly 30 hours at #2. Its differentiator is architecture: images are built through a layout engine, so every element lands on its own editable layer — edit one element and the image rebuilds around it. Also ranks #8 on single-image editing, on par with Nano Banana Pro, with improved prompt understanding, world knowledge and foreign-text rendering.

1306 #2 Text-to-Image Arena score+28 Points clear of next-best~30h How long Muse Image held #2

Reve announcement ↗Arena result ↗Design Arena result ↗

🎙️ Hear our coverage →

ByteDance Jul 8, 2026

New Models

Seedream 5.0 Pro

ByteDance releases Seedream 5.0 Pro with precision editing and layer separation

The flagship tier of the Seedream 5 line pitches a shift from image generator to design tool: interactive precision editing (point, lasso, sketch), intelligent layer separation that decomposes an image into editable layers, dense infographic rendering, and native text in 10+ languages. Rollout is enterprise-first via the BytePlus API, Dreamina and Magnific, with Seedance 2.5 video pre-announced for roughly ten days later.

4K Max native resolution10+ Languages for native text

X announcement ↗Blog ↗

🎙️ Hear our coverage →

Meta AI Jul 7, 2026

New Models

Muse Image & Muse Video

Meta Superintelligence Labs ships Muse Image and previews Muse Video

MSL's first media-generation models: Muse Image is live in the Meta AI app, Instagram Stories (US) and WhatsApp, with agentic generation that calls web search and code execution, multi-reference composition, and Instagram social-context conditioning. Muse Video shares the same pretraining base and adds native audio, debuting at #3 on Arena text-to-video while Muse Image lands #2 on image. There is no public API, and public Instagram accounts are opted in to @-mention remixing by default.

#2 Arena text-to-image debut#3 Arena text-to-video debut1280 Arena image score

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#image-gen #video-gen #consumer-ai

Google DeepMind Jul 2, 2026

New Models

NanoBanana 2 Lite

NanoBanana 2 Lite: sub-4-second images at ~3¢ per 1,000

Google's NanoBanana 2 Lite generates images in under four seconds starting at $0.034 per 1,000 images, with quality above the original NanoBanana. The Interactions API hit GA the same week.

3¢ per 1,000 images<4s generation time

🎙️ Hear our coverage →

June 2026

Ideogram Jun 4, 2026

New ModelsOpen weights

Ideogram 4.0

Ideogram 4.0 becomes the top open-weight text-to-image model

Ideogram released Ideogram 4.0, a 9.3B-parameter text-to-image model with open weights under a non-commercial license. It leads open-weight image models on typography and layout, with bounding-box/layout-style prompting that trades casual generation ease for precise structured control.

9.3B Ideogram 4 parameters

Blog ↗Hugging Face Collection ↗Hugging Face (FP8) ↗X announcement ↗

🎙️ Hear our coverage →

#image-gen #open-source

Reve Jun 4, 2026

New Models

Reve 2.0

Reve 2.0 hits #2 on Text-to-Image Arena with layout-first editing

Reve 2.0 jumped to second place on Text-to-Image Arena (around 1200 ELO) with native 4K output, code-like layout control, and precise editing. Alex's live tests found inconsistent portrait identity, but the layout-first editor is the real differentiator for graphic and image iteration workflows.

Blog (The Layout Bet) ↗Try it ↗X announcement ↗

🎙️ Hear our coverage →

May 2026

Microsoft May 28, 2026

New Models

MAI-Image-2.5

Microsoft MAI-Image-2.5 jumps to #3 on Arena text-to-image

MAI-Image-2.5 jumped to number two on Arena's image-to-image leaderboard shortly after launch, with notable strength in image cleanup, backgrounds, documents, and diagrams. Hands-on tests on the show were mixed, and it is publicly accessible through playground.microsoft.ai.

Microsoft MAI Image 2.5 — Arena ↗Microsoft AI announcement ↗MAI-Image-2.5 announcement image ↗X announcement ↗

🎙️ Hear our coverage (+1 follow-up) →

#image-gen #benchmarks

P PrismML May 28, 2026

New ModelsOpen weights

Bonsai Image 4B

PrismML's 1-bit Bonsai Image 4B runs local image gen under 1GB

PrismML released 1-bit and ternary versions of Bonsai Image 4B, a sub-1GB diffusion transformer for local image generation. The quantized model even runs in-browser via WebGPU and ships with an iOS app and a Hugging Face demo.

PrismML Bonsai Image 4B — blog ↗PrismML Bonsai on Hugging Face ↗Bonsai Image demo ↗Bonsai Studio iOS app ↗

🎙️ Hear our coverage →

#image-gen #on-device #infrastructure

Pruna AI May 28, 2026

New Models

P-Image-Upscale

Pruna AI's P-Image-Upscale hits 128 megapixel outputs

Pruna AI released P-Image-Upscale, an image upscaling model that reaches 128 megapixel outputs with fast generation and predictable pricing. It is available through Pruna's API and on Replicate.

Pruna P-Image-Upscale on Replicate ↗P-Image-Upscale docs ↗Pruna announcement ↗

🎙️ Hear our coverage →

Runway May 28, 2026

Products & Apps

Project Luxo

Runway launches Project Luxo for solo-creator short films

Runway launched Project Luxo, claiming AI-generated video has crossed the uncanny valley for solo-creator short films. The pitch is that a single creator can now produce watchable short-form films end to end with Runway's stack.

Runway Project Luxo — blog ↗Runway announcement ↗

🎙️ Hear our coverage →

#video-gen #image-gen

Google DeepMind May 21, 2026

New Models

Gemini Omni

Gemini Omni: 'create anything from anything' conversational video editor

Google DeepMind launched Gemini Omni, a multimodal 'create anything from anything' model debuting as Google's first conversational video editor. Unlike pure text-to-video systems, Omni is an iterative multi-turn editing model that combines Gemini intelligence, world knowledge, multimodal inputs and generative media, in the same way Nano Banana brought Gemini to interactive image editing. It is available in the Gemini app, Google Flow and YouTube, with API support coming soon.

DeepMind model page ↗Google DeepMind on X ↗Logan on availability ↗Gemini App ↗

🎙️ Hear our coverage (+1 follow-up) →

#video-gen #multimodal #image-gen

Krea AI May 14, 2026

New Models

Krea 2

Krea 2: Krea's first from-scratch foundation image model

Krea released Krea 2, its first foundation image model trained from scratch, built over six to seven months by nearly half the company. It focuses on aesthetic diversity, style control with up to 4 reference images, and moodboard-driven workflows, generating images in roughly 15 seconds. Co-founder and CEO Victor Perez joined the show to walk through it.

X announcement ↗Blog ↗

🎙️ Hear our coverage →

#image-gen #architecture

April 2026

Anthropic Apr 23, 2026

Products & Apps

Claude Design

Anthropic ships Claude Design research preview, Figma stock drops 7%

Anthropic released Claude Design as a research preview running on Opus 4.7 at claude.ai/design, and Figma stock dropped 7% on the news. Alex generated a full ThursdAI brand kit including logo, design tokens, and the episode opener videos end-to-end inside Claude Design, then had Codex pick up the kit and produce a GPT-5.5 launch video in 9 minutes. Anthropic also added a new usage meter to Claude Max settings.

Claude Design announcement ↗Try Claude Design ↗

🎙️ Hear our coverage →

#image-gen #agents

Baidu Apr 16, 2026

New ModelsOpen weights

ERNIE-Image

Baidu ERNIE-Image: 8B DiT ranks #1 on GenEval among open models

Baidu released ERNIE-Image, an 8B diffusion transformer that ranks #1 on GenEval among open models and features precise multilingual text rendering. It is part of this week's wave of Chinese open releases in image and 3D generation.

ERNIE-Image on Hugging Face ↗

🎙️ Hear our coverage →

#image-gen #architecture #open-source

OpenAI Apr 9, 2026

New Models

GPT-Image-2

OpenAI's GPT-Image-2 leaks on LM Arena under three codenames

OpenAI's GPT-Image-2 posted the biggest single jump ever recorded on Arena, sitting 200+ ELO points above the previous top image model even on medium reasoning. The thinking/reasoning image model generates functioning QR codes, pixel-perfect infographics, 4K output, multi-image character consistency, and equirectangular 360-degree images that Peter Gostev stitched into a walkable street-view reconstruction of ancient Babylon. It even produces screenshots of IDEs containing SVG code that actually renders, enabling a new design-then-implement meta with Codex.

levelsio on X ↗RituWithAI on X ↗DataChaz on X ↗GPT-Image-2 announcement ↗

🎙️ Hear our coverage (+1 follow-up) →

#image-gen #reasoning

Alibaba (Wan) Apr 2, 2026

New Models

Wan2.7-Image

Alibaba Wan2.7-Image unifies generation, editing, and text rendering

Alibaba's Wan team released Wan2.7-Image, a unified image model covering generation, editing, text rendering, and multi-image consistency. The panel covered it in the open ecosystem round-up alongside the Qwen updates.

Announcement (X) ↗Wan site ↗

🎙️ Hear our coverage →

Microsoft Apr 2, 2026

New Models

MAI-Image-2

Microsoft MAI releases MAI-Image-2 image generation model

MAI-Image-2 is Microsoft's new in-house image generation model, debuting at #3 in image-gen rankings as part of the MAI three-model release. The panel compared its positioning against specialist image products and foundation-model APIs.

Mustafa Suleyman announcement (X) ↗MAI-Image-2 blog ↗

🎙️ Hear our coverage →

March 2026

Luma AI Mar 26, 2026

New Models

Uni-1

Luma Labs Uni-1 thinks and generates pixels simultaneously, #1 preference Elo

Luma Labs released Uni-1, an LLM-based image model that thinks and generates pixels simultaneously and claims the number-one human preference Elo. Unlike traditional diffusion workflows you converse with it and iterate together toward results, and it can also generate infographics; a surprising pivot from Luma's video focus.

Luma Labs announcement (X) ↗Uni-1 announcement page ↗Try Uni-1 in the Luma app ↗

🎙️ Hear our coverage →

#image-gen #multimodal

Modular Mar 26, 2026

Products & Apps

Modular 26.2

Modular 26.2 runs FLUX.2 in under a second, 99% cheaper than Nano Banana

Modular shipped its 26.2 release with state-of-the-art image generation, running FLUX.2 in under one second (sub-300ms claims) at 99% lower cost than Nano Banana, plus upgraded AI coding with Mojo. Alex noted the surprise of an inference platform releasing model-level optimization and hoped the approach spreads to all image generation.

Modular announcement (X) ↗Modular 26.2 blog post ↗Modular FLUX.2 speed demo (X) ↗

🎙️ Hear our coverage →

#image-gen #infrastructure #coding

P Phota Labs Mar 26, 2026

Products & Apps

Phota Studio + API

Phota Labs launches Phota Studio + API with identity-preserving personalization

Phota Labs launched Phota Studio and an API around a photography-focused image model with identity-preserving personalization: upload a batch of your photos, it trains a personal model, and the generated images actually resemble you. Alex flagged the personalization as a real capability jump over the crowd of photo startups, for professional shots, photo fixes, and adding people to photos.

Phota Labs announcement (X) ↗Try Phota Studio ↗

🎙️ Hear our coverage →

#image-gen #consumer-ai

NVIDIA Mar 19, 2026

Major Features & Updates

DLSS 5

NVIDIA DLSS 5 adds a generative AI filter for photo-realistic lighting

Announced at GTC, NVIDIA's DLSS 5 introduces a new generative AI filter bringing photo-realistic lighting to RTX 50-series GPUs. It applies generative models to real-time game rendering, extending DLSS beyond upscaling and frame generation.

Digital Foundry coverage ↗

🎙️ Hear our coverage →

#image-gen #world-models #infrastructure

Black Forest Labs Mar 5, 2026

Papers & Research

Self-Flow

Black Forest Labs introduces Self-Flow

Black Forest Labs published Self-Flow, new research from the FLUX makers in the AI art and diffusion space. It was included in the week's AI Art & Diffusion roundup.

BFL Self-Flow announcement ↗Self-Flow research page ↗

🎙️ Hear our coverage →

#image-gen #architecture #research

February 2026

Google DeepMind Feb 26, 2026

New Models

Nano Banana 2

Google DeepMind launches Nano Banana 2 image model mid-show

Google DeepMind announced Nano Banana 2 during the show, a Flash-quality tier of its image model line. Alex broke in mid-TLDR to describe near-Pro image quality at roughly half the price, plus a new image search capability.

Google DeepMind announcement on X ↗Nano Banana page ↗

🎙️ Hear our coverage →

#image-gen #multimodal

Q Quiver Feb 26, 2026

New Models

Arrow 1.0

Quiver tackles SVG generation with Arrow 1.0

Quiver released Arrow 1.0, pitched as solving SVG generation. It was included in the week's AI art and diffusion roundup as a notable niche release for vector graphics.

Arrow 1.0 demo on X ↗

🎙️ Hear our coverage →

Alibaba (Qwen) Feb 12, 2026

New Models

Qwen-Image-2.0

Alibaba launches Qwen-Image-2.0 with native 2K resolution

Alibaba's Qwen team launched Qwen-Image-2.0, a 7B-parameter image generation model with native 2K resolution output and superior text rendering. Available to try on chat.qwen.ai.

Alibaba Qwen announcement on X ↗Try it on Qwen Chat ↗

🎙️ Hear our coverage →

January 2026

Alibaba (Tongyi Lab) Jan 29, 2026

New ModelsOpen weights

Z-Image

Tongyi Lab releases Z-Image generation model

Alibaba's Tongyi Lab released Z-Image, a new image generation model, with support landing in the open-source DiffSynth-Studio toolkit on GitHub. Covered in the AI Art segment alongside HunyuanImage 3.0.

Announcement (X) ↗GitHub (DiffSynth-Studio) ↗

🎙️ Hear our coverage →

#image-gen #open-source

Tencent (Hunyuan) Jan 29, 2026

New Models

HunyuanImage 3.0-Instruct

Tencent launches HunyuanImage 3.0-Instruct image model

Tencent's Hunyuan team launched HunyuanImage 3.0-Instruct, an instruction-tuned version of its image generation model. Covered briefly in the AI Art segment alongside other new image models this week.

Announcement (X) ↗Follow-up (X) ↗

🎙️ Hear our coverage →

xAI Jan 29, 2026

APIs & Platforms

Grok Imagine API

xAI launches Grok Imagine API with video generation

xAI released the Grok Imagine API, exposing its image and video generation capabilities to developers through the xAI console. The show subtitle notes Grok Imagine ranking #1 among generation models this week.

Announcement (X) ↗xAI Console ↗

🎙️ Hear our coverage →

#video-gen #image-gen #api

Black Forest Labs Jan 15, 2026

New ModelsOpen weights

Flux 2 Klein

Black Forest Labs drops Flux 2 Klein, fast open-weights image model

Wolfram broke the news mid-show: Black Forest Labs released Flux 2 Klein, a fast 4B/9B image generation model with open weights under Apache 2.0. It is designed for near-real-time editing and style iteration, and Alex used it minutes later in his live Claude Cowork demo.

🎙️ Hear our coverage →

#image-gen #open-source

Pruna AI Jan 8, 2026

New Models

Qwen Edit 2512

Qwen Edit 2512 optimized by PrunaAI: high-res images in under 7s

PrunaAI released an optimized version of Qwen Edit 2512 that generates high-resolution realistic images in under 7 seconds. The optimized model is available to run on Replicate.

Qwen Edit 2512 on Replicate ↗

🎙️ Hear our coverage →

December 2025

Black Forest Labs Dec 25, 2025

New Models

Flux 3

Flux 3 becomes the new gold standard for image generation

Flux 3 dropped in August and immediately became the gold standard for image generation, landing three years almost to the day after Stable Diffusion first went public. Wolfram used it as the yardstick for how far image AI traveled in those three years.

🎙️ Hear our coverage →

OpenAI Dec 25, 2025

Major Features & Updates

GPT-4o native image generation

GPT-4o native image generation sparks Ghibli-mania

OpenAI shipped native image generation in GPT-4o, producing the viral Ghibli-style image wave and bringing AI image creation to the ChatGPT mainstream. Wolfram cited the 2025 paradigm shift in image generation as his release of the year.

Apr 24 Episode ↗

🎙️ Hear our coverage →

Reve Dec 25, 2025

Products & Apps

Reve image platform

Reve ships a 4-in-1 image creation and editing platform

Reve (rendered as 'RevA' in the episode) emerged in September as a four-in-one image creation and editing platform. Alex said he still uses it daily, making it one of the year's sleeper product hits.

🎙️ Hear our coverage →

#image-gen #consumer-ai

OpenAI Dec 18, 2025

New Models

GPT Image 1.5

OpenAI GPT Image 1.5: 4x faster, 20% cheaper, #1 on LMSYS Image Arena

OpenAI released GPT Image 1.5, an upgraded image generation model that is 4x faster and 20% cheaper than its predecessor. It debuted at #1 on the LMSYS Image Arena leaderboard, part of OpenAI's rapid-fire release week.

OpenAI GPT Image 1.5 announcement ↗

🎙️ Hear our coverage →

ByteDance Dec 4, 2025

New Models

SeeDream 4.5

SeeDream 4.5 adds multi-reference fusion and stronger text rendering

ByteDance's SeeDream 4.5 image model shipped with emphasis on multi-reference fusion and improved text rendering, an area the panel noted remains a key differentiator among image generators.

BytePlus announcement on X ↗

🎙️ Hear our coverage →

Kling AI Dec 4, 2025

New Models

Kling O1 Image

Kling O1 Image expands Kling into image generation

Alongside its video update, Kling shipped O1 Image, expanding the company's generation stack into still images. The release rounds out Kling's multimodal offering beyond its core video models.

Kling O1 Image announcement on X ↗

🎙️ Hear our coverage →

Pruna AI Dec 4, 2025

New Models

P-Image

Pruna P-Image promises sub-second image generation at $0.005

Pruna AI promoted P-Image, an image generation offering with sub-second generation times at roughly $0.005 per image. The release fit the week's diffusion theme of competing on speed and cost efficiency rather than just quality.

$0.005 Per image

Pruna P-Image ↗Pruna demo ↗Pruna announcement on X ↗

🎙️ Hear our coverage →

#image-gen #infrastructure

November 2025

Alibaba (Tongyi) Nov 27, 2025

New ModelsOpen weights

Z-Image Turbo

Tongyi's Z-Image Turbo brings sub-second open image generation

Alibaba's Tongyi lab released Z-Image Turbo, a 6B-parameter open image generation model that produces images in under a second. It pushes open-source image generation toward real-time speeds at a fraction of the size of competing models.

6B Parameters

Z-Image Turbo on HuggingFace ↗Z-Image on GitHub ↗

🎙️ Hear our coverage →

#image-gen #open-source #architecture

Black Forest Labs Nov 27, 2025

New ModelsOpen weights

FLUX.2

Black Forest Labs releases FLUX.2, a 32B multi-reference image model

Black Forest Labs released FLUX.2, a 32B-parameter image model with open weights (FLUX.2-dev) that supports multi-reference image editing. It lets users combine multiple reference images and prompt edits with variables, a step up in controllable image editing.

32B Parameters

FLUX.2 on HuggingFace ↗FLUX.2 Blog ↗FLUX.2 Announcement on X ↗

🎙️ Hear our coverage →

#image-gen #open-source

Google DeepMind Nov 20, 2025

New Models

Nano Banana Pro

Nano Banana Pro generates 4K images with perfect text

Google's upgraded image model dropped as breaking news mid-show, adding visible thinking traces, 4K resolution output, and SynthID watermarking with C2PA metadata. Alex demoed it live by one-shotting an 8MB AI-news infographic with flawless text and pixel-accurate logos across the entire image. It also powers generative UIs in Gemini, building interactive dashboards with real data on the fly.

4K First image model with flawless 4K output and perfect text

AI Studio (Nano Banana Pro) ↗

🎙️ Hear our coverage →

Alibaba (Qwen) Nov 13, 2025

New ModelsOpen weights

Qwen Image Edit Multi-Angle LoRA

Qwen Image Edit gains Multi-Angle LoRA for camera control

A Multi-Angle LoRA for Qwen Image Edit landed, enabling camera-control style edits that re-render a scene from new angles. Available as a Hugging Face space and on fal, it shows the fast-moving open ecosystem building on Qwen's image editing models.

Linoy Tsaban demo on X ↗Qwen-Image-Edit-Angles space on Hugging Face ↗fal on X ↗

🎙️ Hear our coverage →

#image-gen #architecture

NVIDIA Nov 13, 2025

New ModelsOpen weights

ChronoEdit-14B Upscaler LoRA

NVIDIA releases ChronoEdit-14B Upscaler LoRA

NVIDIA released an Upscaler LoRA for its ChronoEdit-14B image editing model, available on Hugging Face with Diffusers pipeline support. It adds high-quality upscaling to the ChronoEdit physics-aware editing stack.

Announcement on X ↗Hugging Face model page ↗Diffusers ChronoEdit docs ↗

🎙️ Hear our coverage →

#image-gen #architecture

October 2025

Insta360 Research Oct 16, 2025

Papers & ResearchOpen weights

DiT360

DiT360: SOTA panoramic image generation with hybrid training

DiT360 is a diffusion-transformer approach to panoramic image generation that uses hybrid training across perspective and panoramic data to reach state-of-the-art quality. The project page and GitHub release make the work reproducible.

Project page ↗GitHub ↗

🎙️ Hear our coverage →

#image-gen #research

S Sourceful Oct 16, 2025

New Models

Riverflow 1

Riverflow 1 tops the image-editing leaderboard

Sourceful's Riverflow 1 image-editing model took the top spot on the image-editing leaderboard. It is a notable result from a smaller lab in a category dominated by big-name image models.

Sourceful blog ↗

🎙️ Hear our coverage →

September 2025

Reve Sep 18, 2025

Products & Apps

Reve

Reve launches 4-in-1 AI visual platform taking on Nano Banana and Seedream

Reve launched a 4-in-1 AI visual creation platform combining image generation, editing, and related visual workflows in one app. The panel spends real time on it as a serious challenger to Nano Banana and Seedream in the AI image tooling race.

X ↗Reve ↗Blog ↗

🎙️ Hear our coverage →

Tencent Hunyuan Sep 18, 2025

Papers & ResearchOpen weights

Hunyuan SRPO

Hunyuan SRPO: preference optimization that supercharges diffusion models

Tencent Hunyuan published SRPO (Semantic Relative Preference Optimization), a post-training technique that significantly improves the output quality of diffusion image models. The team released weights on Hugging Face along with a project page and striking before/after comparisons.

X ↗HF ↗Project ↗Comparison X ↗

🎙️ Hear our coverage →

#image-gen #architecture

May 2025

Black Forest Labs May 29, 2025

New Models

FLUX.1 Kontext

Black Forest Labs drops FLUX.1 Kontext, SOTA image editing

Black Forest Labs, creators of Flux, released Kontext: three models (Pro, Max, and a 12B open-weights Dev in private preview) for consistent, context-aware text and image editing. Unlike GPT-image or VEO-style regeneration, Kontext keeps identity consistent across edits, adding what you ask for without changing your face every generation. Broke as news during the show.

Tweet ↗Announcement ↗Flux Playground ↗

🎙️ Hear our coverage →

HiDream May 1, 2025

New ModelsOpen weights

HiDream E1

HiDream E1: open-weights image model with standout Ghibli style

HiDream released E1, an open-weights image editing/generation model (Apache 2.0-style licensing) noted for beautiful Ghibli-style outputs. It ranks #4 on the Artificial Analysis image arena leaderboard, sitting among top contenders like Google Imagen and ReCraft.

Hugging Face: HiDream-E1-Full ↗

🎙️ Hear our coverage →

#image-gen #open-source

Runway May 1, 2025

Major Features & Updates

Gen-4 References

Runway References brings character and scene consistency to Gen-4

Runway launched References for Gen-4 on all paid plans, letting creators supply reference images (characters, outfits, locations, even selfies) and use tags in prompts to keep those elements consistent across generations. It tackles AI video's biggest pain point, frame-to-frame identity drift, at no extra credit cost per run.

Runway References examples (X search) ↗

🎙️ Hear our coverage →

#video-gen #image-gen

April 2025

OpenAI Apr 24, 2025

APIs & Platforms

gpt-image-1

OpenAI's GPT Image generation lands in the API as gpt-image-1

OpenAI's powerful image generation capabilities, previously locked inside ChatGPT, are now available to developers via API under the official name gpt-image-1. This was the big one many developers were waiting for, opening up the viral image generation and editing capabilities for building AI art and image editing applications.

X Post ↗Docs ↗API Reference ↗

🎙️ Hear our coverage →

#image-gen #api

Tencent Apr 24, 2025

New Models

Hunyuan 3D 2.5

Tencent's Hunyuan 3D 2.5 jumps to 10B params with PBR textures and rigging

Tencent updated its 3D generation model to Hunyuan 3D 2.5, now boasting 10 billion parameters, up from 1B. They highlight massive leaps in precision with 1024-resolution geometry, high-quality textures with PBR support, and improved skeletal rigging for animation.

10B Parameters (up from 1B)1024 Geometry resolution

🎙️ Hear our coverage →

#world-models #image-gen

ByteDance Apr 17, 2025

New Models

Seedream 3.0

ByteDance Seedream 3.0: bilingual 2K text-to-image model

ByteDance's Seed team announced Seedream 3.0, a powerful bilingual (Chinese/English) text-to-image model that generates native 2048x2048 images with fast inference of around 3 seconds for a 1K image on an A100. It challenges the top closed image generation models.

Tech post ↗arXiv ↗AIbase news ↗

🎙️ Hear our coverage →

#image-gen #architecture

HiDream AI Apr 10, 2025

New ModelsOpen weights

HiDream-I1-Dev

HiDream-I1-Dev: 17B MIT-licensed image model surpasses Flux 1.1 [pro]

HiDream released HiDream-I1-Dev, a 17B parameter open-weights image generation model under an MIT license. It became the new leading open-weights image generator, surpassing Flux 1.1 [pro] on quality benchmarks.

17B Parameters, MIT license

Hugging Face collection: HiDream-I1 ↗

🎙️ Hear our coverage →

#image-gen #open-source

Runway Apr 3, 2025

New Models

Runway Gen-4

Runway Gen-4 announced with major gains in video consistency

Runway announced Gen-4, its next-generation video model focused on character and world consistency across shots. Example videos showed notably coherent characters and scenes, pushing AI video further toward usable filmmaking.

Introducing Runway Gen-4 ↗

🎙️ Hear our coverage →

#video-gen #image-gen

March 2025

Ideogram Mar 27, 2025

New Models

Ideogram 3.0

Ideogram 3.0 launches with strong text, logos, and style references

Ideogram launched version 3.0 of its image generation model with another SOTA claim. It is particularly strong on text and logo rendering, photorealism, and style references, continuing Ideogram's edge in typography-heavy image generation.

Ideogram 3.0 announcement ↗

🎙️ Hear our coverage →

OpenAI Mar 27, 2025

Major Features & Updates

GPT-4o Native Image Generation

OpenAI enables native image generation in GPT-4o, internet goes Ghibli

OpenAI finally enabled GPT-4o's native auto-regressive image generation in ChatGPT, sparking the biggest mainstream AI buzz of the week as the internet ghiblified itself. Launched right after Gemini 2.5, it excels at instruction following, text rendering, and multi-turn editing, with viral demos ranging from ad mockups to a full Lord of the Rings trailer.

X thread with examples ↗Ad threads ↗Full Lord of the Rings trailer ↗Native Image Generation System Card ↗

🎙️ Hear our coverage →

#image-gen #multimodal

Reve Mar 27, 2025

New Models

Reve Image

Reve emerges with SOTA diffusion image generation claims

Reve launched a new diffusion image generation model claiming state-of-the-art quality, reportedly beating heavyweights like Midjourney and Flux at roughly a penny per image. The previously low-profile lab made a splash with strong prompt adherence and image quality.

X announcement (Taesung) ↗Decrypt coverage ↗

🎙️ Hear our coverage →

#image-gen #architecture

Google Mar 20, 2025

Dev Tools

Gemini Co-Drawing

Gemini Co-Drawing demo uses native image output to help you draw

A Hugging Face space demo, Gemini Co-Drawing, uses Gemini's native image generation output to collaboratively complete and enhance your sketches as you draw. It showcases the new native image-output capability of Gemini 2.0 Flash in an interactive tool.

🎙️ Hear our coverage →

#image-gen #agents

ByteDance Mar 13, 2025

New Models

Seedream 2.0

ByteDance unveils Seedream 2.0 bilingual image generation foundation model

ByteDance released Seedream 2.0, a native Chinese-English bilingual image generation foundation model, alongside a technical paper. It emphasizes excellent text rendering (especially Chinese), cultural nuance, and human preference alignment, generating high-quality, culturally relevant images from prompts in either language.

Blog ↗Paper ↗

🎙️ Hear our coverage →

Google DeepMind Mar 13, 2025

Major Features & Updates

Gemini 2.0 Flash native image generation

Gemini Flash gains native image generation and conversational editing

Google enabled native image generation in Gemini Flash Experimental, letting users generate and iteratively edit images conversationally inside the same multimodal model. The crew demoed it live on stream, editing photos of themselves with natural-language instructions, and saw it as a preview of how creative tools like Photoshop will work.

X announcement ↗AI Studio demo ↗

🎙️ Hear our coverage →

#image-gen #multimodal

MiniMax Mar 6, 2025

New Models

Image-01

MiniMax launches Image-01 text-to-image model at 1/10 the cost

MiniMax released Image-01, a versatile text-to-image model the company positions at roughly one tenth the cost of competing image generation offerings. It is available through MiniMax's hosted platform.

Announcement (X) ↗Try It ↗

🎙️ Hear our coverage →

Zhipu AI (GLM) Mar 6, 2025

New ModelsOpen weights

CogView 4 (6B)

Zhipu AI open-sources CogView 4, a 6B text-to-image model

Zhipu AI released CogView 4, a 6B-parameter open text-to-image model in the CogView family, with code available on GitHub. It is notable as an open-weights image generation option with strong Chinese and English prompt support.

Announcement (X) ↗GitHub ↗

🎙️ Hear our coverage →

#image-gen #open-source

January 2025

DeepSeek Jan 30, 2025

New ModelsOpen weights

Janus Pro

DeepSeek Janus Pro: open multimodal models in 1.5B and 7B

Amid the R1 frenzy, DeepSeek also released Janus Pro, unified multimodal models at 1.5B and 7B parameters that handle both image understanding and image generation. The open release added to DeepSeek's week of dominating AI news headlines.

1.5B / 7B Model sizes

GitHub ↗Try it (HF Space) ↗

🎙️ Hear our coverage →

#open-source #image-gen #multimodal