# MLX MLX is Apple's open-source array and machine-learning framework for Apple Silicon — Macs (M-series) and, increasingly, iPhones / iPads / Vision Pro. Released in late 2023; matured fast through 2024–2026 into the default way to run [[Large Language Models (LLMs)]] locally on Apple hardware with full GPU acceleration via Metal. The core bet is **unified memory**: on Apple Silicon, CPU and GPU share the same physical RAM. MLX's API exposes that — arrays live in one place, no host↔device copy, no `.to(device)` dance. For inference and training on a Mac, that single design choice often outperforms ports of CUDA-shaped frameworks. MIT license. Python and Swift bindings. Companion projects: **mlx-lm** (LLM inference + LoRA fine-tuning), **mlx-vlm** (vision-language models), **mlx-data** (data loading), **mlx-examples** (reference implementations). ## What it is A NumPy-like array library plus an autograd / neural-network layer, designed end-to-end for Apple Silicon. Lazy execution and graph compilation; multi-device (CPU + GPU); composable function transforms (`grad`, `vmap`, `compile`). Loosely shaped after JAX and PyTorch in API ergonomics, but with a Metal backend instead of CUDA / ROCm. ## Why it matters for local LLMs By 2026, MLX is the default high-performance runtime for running open-weight LLMs on Macs: - **Quantized weight support** — 2/3/4/6/8-bit quantization out of the box; `mlx_lm.convert --quantize` does the conversion in one command. - **Day-one open-weight model support** — major releases ([[Gemma 4]], Llama 4, Qwen, Mistral, DeepSeek) ship with MLX-format weights on Hugging Face within hours of release. - **Speculative decoding** support for paired draft/target models (see [[Speculative Decoding]] and [[AI Multi-Token Prediction Drafters]]). - **Distributed inference** across multiple Macs over Thunderbolt or local network — scale by adding hardware, not GPUs. - **Fine-tuning** via LoRA / QLoRA on consumer hardware. Models in the 7–70B range are realistic on a maxed-out M-series with unified memory above 64 GB. ## Quick examples ```python import mlx.core as mx a = mx.array([1.0, 2.0, 3.0]) b = mx.array([10.0, 20.0, 30.0]) print(a + b) # array([11, 22, 33], dtype=float32) ``` ```sh # Run an LLM locally with mlx-lm mlx_lm.generate --model mlx-community/gemma-4-31b-it-4bit --prompt "Explain MTP drafters." # Convert HF weights to MLX format with 4-bit quantization mlx_lm.convert --hf-path google/gemma-4-31b-it -q ``` ## Where it fits | Runtime | Best for | Apple Silicon native? | |---|---|---| | **MLX** / mlx-lm | Mac local inference + fine-tuning | Yes — Metal-first | | [[Ollama]] | Cross-platform local inference, simple UX | Yes (uses llama.cpp under the hood) | | [[vLLM]] | Datacenter-scale inference, batched serving | No (CUDA-first) | | [[SGLang]] | Structured LLM programs, batching | No (CUDA-first) | | Transformers + PyTorch (MPS) | Research, broad model support | Partial (MPS backend ≠ Metal-native) | For *running* models on a Mac, MLX usually wins on tokens/second per watt. For *deploying* across heterogeneous hardware or large servers, vLLM/SGLang are the right call. ## Trade-offs - **Apple-only.** No CUDA, no ROCm, no Vulkan. Locks workloads to Apple hardware. Not suitable for production servers. - **Younger ecosystem.** PyTorch and JAX have years of accumulated kernels, debugging tools, and community recipes. MLX is catching up but not at parity for esoteric architectures. - **Documentation depth.** Strong for the core path (LLM inference, basic training). Thinner for advanced custom kernels. ## References - Project: <https://github.com/ml-explore/mlx> - mlx-lm (LLMs): <https://github.com/ml-explore/mlx-lm> - mlx-vlm: <https://github.com/Blaizzy/mlx-vlm> - mlx-examples: <https://github.com/ml-explore/mlx-examples> - Documentation: <https://ml-explore.github.io/mlx/> - License: MIT ## Related - [[Large Language Models (LLMs)]] - [[AI Inference]] - [[AI Open Weight Models]] - [[Speculative Decoding]] - [[AI Multi-Token Prediction Drafters]] - [[Gemma 4]] - [[Ollama]] - [[vLLM]] - [[SGLang]] - [[Transformers]]