# MLX
MLX is Apple's open-source array and machine-learning framework for Apple Silicon — Macs (M-series) and, increasingly, iPhones / iPads / Vision Pro. Released in late 2023; matured fast through 2024–2026 into the default way to run [[Large Language Models (LLMs)]] locally on Apple hardware with full GPU acceleration via Metal.
The core bet is **unified memory**: on Apple Silicon, CPU and GPU share the same physical RAM. MLX's API exposes that — arrays live in one place, no host↔device copy, no `.to(device)` dance. For inference and training on a Mac, that single design choice often outperforms ports of CUDA-shaped frameworks.
MIT license. Python and Swift bindings. Companion projects: **mlx-lm** (LLM inference + LoRA fine-tuning), **mlx-vlm** (vision-language models), **mlx-data** (data loading), **mlx-examples** (reference implementations).
## What it is
A NumPy-like array library plus an autograd / neural-network layer, designed end-to-end for Apple Silicon. Lazy execution and graph compilation; multi-device (CPU + GPU); composable function transforms (`grad`, `vmap`, `compile`). Loosely shaped after JAX and PyTorch in API ergonomics, but with a Metal backend instead of CUDA / ROCm.
## Why it matters for local LLMs
By 2026, MLX is the default high-performance runtime for running open-weight LLMs on Macs:
- **Quantized weight support** — 2/3/4/6/8-bit quantization out of the box; `mlx_lm.convert --quantize` does the conversion in one command.
- **Day-one open-weight model support** — major releases ([[Gemma 4]], Llama 4, Qwen, Mistral, DeepSeek) ship with MLX-format weights on Hugging Face within hours of release.
- **Speculative decoding** support for paired draft/target models (see [[Speculative Decoding]] and [[AI Multi-Token Prediction Drafters]]).
- **Distributed inference** across multiple Macs over Thunderbolt or local network — scale by adding hardware, not GPUs.
- **Fine-tuning** via LoRA / QLoRA on consumer hardware. Models in the 7–70B range are realistic on a maxed-out M-series with unified memory above 64 GB.
## Quick examples
```python
import mlx.core as mx
a = mx.array([1.0, 2.0, 3.0])
b = mx.array([10.0, 20.0, 30.0])
print(a + b) # array([11, 22, 33], dtype=float32)
```
```sh
# Run an LLM locally with mlx-lm
mlx_lm.generate --model mlx-community/gemma-4-31b-it-4bit --prompt "Explain MTP drafters."
# Convert HF weights to MLX format with 4-bit quantization
mlx_lm.convert --hf-path google/gemma-4-31b-it -q
```
## Where it fits
| Runtime | Best for | Apple Silicon native? |
|---|---|---|
| **MLX** / mlx-lm | Mac local inference + fine-tuning | Yes — Metal-first |
| [[Ollama]] | Cross-platform local inference, simple UX | Yes (uses llama.cpp under the hood) |
| [[vLLM]] | Datacenter-scale inference, batched serving | No (CUDA-first) |
| [[SGLang]] | Structured LLM programs, batching | No (CUDA-first) |
| Transformers + PyTorch (MPS) | Research, broad model support | Partial (MPS backend ≠ Metal-native) |
For *running* models on a Mac, MLX usually wins on tokens/second per watt. For *deploying* across heterogeneous hardware or large servers, vLLM/SGLang are the right call.
## Trade-offs
- **Apple-only.** No CUDA, no ROCm, no Vulkan. Locks workloads to Apple hardware. Not suitable for production servers.
- **Younger ecosystem.** PyTorch and JAX have years of accumulated kernels, debugging tools, and community recipes. MLX is catching up but not at parity for esoteric architectures.
- **Documentation depth.** Strong for the core path (LLM inference, basic training). Thinner for advanced custom kernels.
## References
- Project: <https://github.com/ml-explore/mlx>
- mlx-lm (LLMs): <https://github.com/ml-explore/mlx-lm>
- mlx-vlm: <https://github.com/Blaizzy/mlx-vlm>
- mlx-examples: <https://github.com/ml-explore/mlx-examples>
- Documentation: <https://ml-explore.github.io/mlx/>
- License: MIT
## Related
- [[Large Language Models (LLMs)]]
- [[AI Inference]]
- [[AI Open Weight Models]]
- [[Speculative Decoding]]
- [[AI Multi-Token Prediction Drafters]]
- [[Gemma 4]]
- [[Ollama]]
- [[vLLM]]
- [[SGLang]]
- [[Transformers]]