# DiffusionGemma
DiffusionGemma is an experimental open model from [[Google DeepMind]] in the [[Gemma]] family that generates text by **diffusion** instead of autoregression. Rather than predicting one token at a time, it denoises whole spans in parallel, built on Gemma 4 and Gemini Diffusion research. The payoff is speed: up to ~4× faster output, exceeding 1,000 tokens/second on a single NVIDIA H100.
## How diffusion text generation differs
Standard [[Large Language Models (LLMs)]] are autoregressive: strictly sequential, token-by-token, bottlenecked on memory bandwidth. DiffusionGemma uses **discrete text diffusion** with bi-directional attention to generate many tokens per forward pass (a "canvas" of 256), then iteratively refines them, enabling self-correction and better global consistency, and shifting the bottleneck from memory bandwidth to raw compute.
## Architecture
- ~26B total parameters (25.2B), **3.8B active**: [[AI Mixture of Experts (MoE)]] (8 of 128 experts active + 1 shared)
- Encoder-decoder: an autoregressive encoder caches the prompt context, paired with a diffusion decoder
- Up to 256K token [[Context Window]]; 262K vocabulary; sliding window 1024
- 15–20 tokens generated per forward pass (>1100 tok/s on H100 in FP8)
- Multimodal input (text, image, video → text); ~550M vision params; built-in thinking mode
- Supports NVIDIA NVFP4 (4-bit float) on Blackwell GPUs
## Performance (instruction-tuned)
- MMLU Pro: 77.6% · GPQA Diamond: 73.2% · LiveCodeBench v6: 69.1% · MATH-Vision: 70.5%
## Availability
- [[Apache 2.0 License]]; an [[AI Open Weight Models|open-weight]] release
- On HuggingFace (`google/diffusiongemma-26B-A4B-it`), Kaggle, and Google Vertex AI Model Garden
## References
- https://deepmind.google/models/gemma/diffusiongemma/
- https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/
- https://huggingface.co/google/diffusiongemma-26B-A4B-it
- https://ai.google.dev/gemma/docs/diffusiongemma
- https://developers.googleblog.com/diffusiongemma-the-developer-guide/
- https://www.xda-developers.com/tried-google-diffusiongemma-generate-text-like-image-local-llm/
- https://vllm.ai/blog/2026-06-10-diffusion-gemma
## Related
- [[Gemma]]
- [[Google DeepMind]]
- [[Gemini]]
- [[Diffusion Models]]
- [[Large Language Models (LLMs)]]
- [[AI Mixture of Experts (MoE)]]
- [[AI Open Weight Models]]
- [[Context Window]]
- [[Apache 2.0 License]]