# Mistral Small 4 First [[Mistral AI]] model to unify instruct, reasoning, multimodal, and agentic coding capabilities into a single model. Consolidates the strengths of Magistral (reasoning), Pixtral (vision), and Devstral (coding) into one package. Released under the [[Apache 2.0 License]]. ## Architecture - Mixture of Experts (MoE): 128 experts, 4 active per token - 119B total parameters, 6B active per token (8B including embedding and output layers) - 256k context window - Native multimodality: text and image inputs - Configurable reasoning effort via `reasoning_effort` parameter (`none` for fast chat, `high` for deep step-by-step reasoning) ## Performance - 40% reduction in end-to-end completion time vs Mistral Small 3 (latency-optimized) - 3x more requests per second (throughput-optimized) vs Mistral Small 3 - Competitive with GPT-OSS 120B on benchmarks while generating significantly shorter outputs (lower latency, reduced cost) ## Infrastructure requirements - Minimum: 4x NVIDIA HGX H100, 2x HGX H200, or 1x DGX B200 - Recommended: 4x HGX H100, 4x HGX H200, or 2x DGX B200 ## Intended use cases - Coding automation and agentic coding workflows - General chat assistants and document understanding - Multimodal analysis (text + images) - Math, research, and complex reasoning tasks ## Availability - Mistral API and AI Studio - Hugging Face - NVIDIA NIM (optimized containerized inference) - Community frameworks: vLLM, llama.cpp, SGLang, Transformers ## References - https://mistral.ai/news/mistral-small-4 - Hugging Face: https://huggingface.co/collections/mistralai/mistral-small-4 ## Related - [[Mistral AI]] - [[Mistral Small 3.1]] - [[Mistral Large 3]] - [[Large Language Models (LLMs)]] - [[Mistral OCR]] - [[Apache 2.0 License]]