# Granite
Granite is IBM's family of open-weight [[Large Language Models (LLMs)]] for enterprise use, released under the [[Apache 2.0 License]]. The line was launched in 2023 with decoder-only [[Transformers|transformer]] models for code and language, and has since grown to include embedding models, vision-capable variants, and multiple architectural generations.
Granite's defining bet is **enterprise predictability over benchmark headlines**: permissive licensing, transparent training data, FP8 quantization, long [[Context Window|context windows]], and architectural choices that favor deterministic inference cost.
## Generations
- **Granite 1–3** (2023–2025) — decoder-only dense [[Transformers|transformers]] for code, language, and instruction-following; the foundation for IBM watsonx model offerings
- **Granite 4.0** (late 2025) — hybrid Mamba-2 / Transformer with [[AI Mixture of Experts (MoE)|MoE]] routing; IBM's experiment with sparse + state-space architecture
- **[[Granite 4.1]]** (April 2026) — deliberate retreat from MoE; dense decoder-only at 3B / 8B / 30B with 128K–512K context. Headline claim: 8B dense matches the prior 32B MoE predecessor
## Companion models
- **Granite embedding models** — small ([[Embeddings|embedding]]) models in the `ibm-granite` Hugging Face collection (e.g., 311M and 97M variants released alongside Granite 4.1)
- **Granite Code** — code-specialized checkpoints
- **Granite Vision** — multimodal variants
## Positioning
- Apache 2.0 across the board — no research-only or non-commercial carve-outs
- Targets regulated and enterprise workloads where IBM watsonx is already deployed
- Counter-trend in 2026: while [[NVIDIA Nemotron]], [[GLM-5.1]], and most frontier open-weight families lean into MoE plus reasoning modes, Granite 4.1 doubles down on dense + no built-in reasoning + extreme long-context
- Distribution: [[HuggingFace]] (`ibm-granite` collection), [[Ollama]], vLLM, HuggingFace Transformers, IBM watsonx, IBM proprietary API
## Why it matters
- One of the few US-headquartered open-weight families competing in the [[Small Language Models (SLMs)|small/mid-size]] tier alongside Phi (Microsoft), Gemma (Google), and Llama (Meta)
- IBM is the only major enterprise software vendor shipping its own open-weight LLMs as a first-class product, not as a research artifact
- The Granite 4.0 → 4.1 architectural reversal is a useful data point on the MoE-vs-dense debate at the 8B–30B scale, where MoE's parameter-efficiency advantage is least pronounced
## References
- Hugging Face collection: https://huggingface.co/ibm-granite
- Ollama library: https://ollama.com/library/granite
## Related
- [[Granite 4.1]]
- [[Large Language Models (LLMs)]]
- [[AI Open Weight Models]]
- [[Apache 2.0 License]]
- [[Dense AI Models]]
- [[AI Mixture of Experts (MoE)]]
- [[Context Window]]
- [[Small Language Models (SLMs)]]
- [[Embeddings]]
- [[Transformers]]
- [[NVIDIA Nemotron]]
- [[GLM-5.1]]
- [[HuggingFace]]
- [[Ollama]]
- [[AI Tool Use]]
- [[AI Frontier Model]]