# Mistral Small 4
First [[Mistral AI]] model to unify instruct, reasoning, multimodal, and agentic coding capabilities into a single model. Consolidates the strengths of Magistral (reasoning), Pixtral (vision), and Devstral (coding) into one package. Released under the [[Apache 2.0 License]].
## Architecture
- Mixture of Experts (MoE): 128 experts, 4 active per token
- 119B total parameters, 6B active per token (8B including embedding and output layers)
- 256k context window
- Native multimodality: text and image inputs
- Configurable reasoning effort via `reasoning_effort` parameter (`none` for fast chat, `high` for deep step-by-step reasoning)
## Performance
- 40% reduction in end-to-end completion time vs Mistral Small 3 (latency-optimized)
- 3x more requests per second (throughput-optimized) vs Mistral Small 3
- Competitive with GPT-OSS 120B on benchmarks while generating significantly shorter outputs (lower latency, reduced cost)
## Infrastructure requirements
- Minimum: 4x NVIDIA HGX H100, 2x HGX H200, or 1x DGX B200
- Recommended: 4x HGX H100, 4x HGX H200, or 2x DGX B200
## Intended use cases
- Coding automation and agentic coding workflows
- General chat assistants and document understanding
- Multimodal analysis (text + images)
- Math, research, and complex reasoning tasks
## Availability
- Mistral API and AI Studio
- Hugging Face
- NVIDIA NIM (optimized containerized inference)
- Community frameworks: vLLM, llama.cpp, SGLang, Transformers
## References
- https://mistral.ai/news/mistral-small-4
- Hugging Face: https://huggingface.co/collections/mistralai/mistral-small-4
## Related
- [[Mistral AI]]
- [[Mistral Small 3.1]]
- [[Mistral Large 3]]
- [[Large Language Models (LLMs)]]
- [[Mistral OCR]]
- [[Apache 2.0 License]]