# Granite Granite is IBM's family of open-weight [[Large Language Models (LLMs)]] for enterprise use, released under the [[Apache 2.0 License]]. The line was launched in 2023 with decoder-only [[Transformers|transformer]] models for code and language, and has since grown to include embedding models, vision-capable variants, and multiple architectural generations. Granite's defining bet is **enterprise predictability over benchmark headlines**: permissive licensing, transparent training data, FP8 quantization, long [[Context Window|context windows]], and architectural choices that favor deterministic inference cost. ## Generations - **Granite 1–3** (2023–2025) — decoder-only dense [[Transformers|transformers]] for code, language, and instruction-following; the foundation for IBM watsonx model offerings - **Granite 4.0** (late 2025) — hybrid Mamba-2 / Transformer with [[AI Mixture of Experts (MoE)|MoE]] routing; IBM's experiment with sparse + state-space architecture - **[[Granite 4.1]]** (April 2026) — deliberate retreat from MoE; dense decoder-only at 3B / 8B / 30B with 128K–512K context. Headline claim: 8B dense matches the prior 32B MoE predecessor ## Companion models - **Granite embedding models** — small ([[Embeddings|embedding]]) models in the `ibm-granite` Hugging Face collection (e.g., 311M and 97M variants released alongside Granite 4.1) - **Granite Code** — code-specialized checkpoints - **Granite Vision** — multimodal variants ## Positioning - Apache 2.0 across the board — no research-only or non-commercial carve-outs - Targets regulated and enterprise workloads where IBM watsonx is already deployed - Counter-trend in 2026: while [[NVIDIA Nemotron]], [[GLM-5.1]], and most frontier open-weight families lean into MoE plus reasoning modes, Granite 4.1 doubles down on dense + no built-in reasoning + extreme long-context - Distribution: [[HuggingFace]] (`ibm-granite` collection), [[Ollama]], vLLM, HuggingFace Transformers, IBM watsonx, IBM proprietary API ## Why it matters - One of the few US-headquartered open-weight families competing in the [[Small Language Models (SLMs)|small/mid-size]] tier alongside Phi (Microsoft), Gemma (Google), and Llama (Meta) - IBM is the only major enterprise software vendor shipping its own open-weight LLMs as a first-class product, not as a research artifact - The Granite 4.0 → 4.1 architectural reversal is a useful data point on the MoE-vs-dense debate at the 8B–30B scale, where MoE's parameter-efficiency advantage is least pronounced ## References - Hugging Face collection: https://huggingface.co/ibm-granite - Ollama library: https://ollama.com/library/granite ## Related - [[Granite 4.1]] - [[Large Language Models (LLMs)]] - [[AI Open Weight Models]] - [[Apache 2.0 License]] - [[Dense AI Models]] - [[AI Mixture of Experts (MoE)]] - [[Context Window]] - [[Small Language Models (SLMs)]] - [[Embeddings]] - [[Transformers]] - [[NVIDIA Nemotron]] - [[GLM-5.1]] - [[HuggingFace]] - [[Ollama]] - [[AI Tool Use]] - [[AI Frontier Model]]