# GPT-Generated Unified Format (GGUF)
GGUF is the binary file format used to store [[Large Language Models (LLMs)]] for local inference, introduced by the [[llama.cpp]] project as the successor to the older [[Georgi Gerganov Machine Learning (GGML)|GGML]] format. A single `.gguf` file bundles everything needed to run a model: the (usually quantized) weights, the tensor layout, rich key-value metadata, and the chat template. That self-contained design is why it became the de facto distribution format for [[AI Open Weight Models|open-weight]] models you run yourself.
## Why it matters
- **One file, no surprises**: metadata and template travel with the weights, so a runtime can load a model with minimal configuration
- **Quantization-first**: GGUF is where low-bit quants live (e.g. Q4_K_M, and the 2-bit dynamic quants used to run big models like [[GLM-5.2]] on a Mac)
- **Ecosystem default**: produced by tools like [[Unsloth]], consumed by [[llama.cpp]], [[Ollama]], [[Mistral.rs]], and [[Docker Model Runner]]
## Related
- [[Georgi Gerganov Machine Learning (GGML)]]
- [[Safetensors]]
- [[ONNX]]
- [[Georgi Gerganov]]
- [[llama.cpp]]
- [[Large Language Models (LLMs)]]
- [[AI Open Weight Models]]
- [[Ollama]]
- [[Unsloth]]
- [[Mistral.rs]]
- [[Docker Model Runner]]
- [[GLM-5.2]]