DiffusionGemma - DeveloPassion

# DiffusionGemma DiffusionGemma is an experimental open model from [[Google DeepMind]] in the [[Gemma]] family that generates text by **diffusion** instead of autoregression. Rather than predicting one token at a time, it denoises whole spans in parallel, built on Gemma 4 and Gemini Diffusion research. The payoff is speed: up to ~4× faster output, exceeding 1,000 tokens/second on a single NVIDIA H100. ## How diffusion text generation differs Standard [[Large Language Models (LLMs)]] are autoregressive: strictly sequential, token-by-token, bottlenecked on memory bandwidth. DiffusionGemma uses **discrete text diffusion** with bi-directional attention to generate many tokens per forward pass (a "canvas" of 256), then iteratively refines them, enabling self-correction and better global consistency, and shifting the bottleneck from memory bandwidth to raw compute. ## Architecture - ~26B total parameters (25.2B), **3.8B active**: [[AI Mixture of Experts (MoE)]] (8 of 128 experts active + 1 shared) - Encoder-decoder: an autoregressive encoder caches the prompt context, paired with a diffusion decoder - Up to 256K token [[Context Window]]; 262K vocabulary; sliding window 1024 - 15–20 tokens generated per forward pass (>1100 tok/s on H100 in FP8) - Multimodal input (text, image, video → text); ~550M vision params; built-in thinking mode - Supports NVIDIA NVFP4 (4-bit float) on Blackwell GPUs ## Performance (instruction-tuned) - MMLU Pro: 77.6% · GPQA Diamond: 73.2% · LiveCodeBench v6: 69.1% · MATH-Vision: 70.5% ## Availability - [[Apache 2.0 License]]; an [[AI Open Weight Models|open-weight]] release - On HuggingFace (`google/diffusiongemma-26B-A4B-it`), Kaggle, and Google Vertex AI Model Garden ## References - https://deepmind.google/models/gemma/diffusiongemma/ - https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/ - https://huggingface.co/google/diffusiongemma-26B-A4B-it - https://ai.google.dev/gemma/docs/diffusiongemma - https://developers.googleblog.com/diffusiongemma-the-developer-guide/ - https://www.xda-developers.com/tried-google-diffusiongemma-generate-text-like-image-local-llm/ - https://vllm.ai/blog/2026-06-10-diffusion-gemma ## Related - [[Gemma]] - [[Google DeepMind]] - [[Gemini]] - [[Diffusion Models]] - [[Large Language Models (LLMs)]] - [[AI Mixture of Experts (MoE)]] - [[AI Open Weight Models]] - [[Context Window]] - [[Apache 2.0 License]]