# Neural Processing Unit (NPU)
A specialized processor designed to accelerate machine learning workloads — particularly the matrix multiplications and activations at the heart of [[Neural Networks (NNs)]]. NPUs sit alongside CPUs and GPUs in modern devices, providing dramatically better performance-per-watt for ML tasks.
Also called: AI accelerator, ML accelerator, AI Engine, Neural Engine.
## Why NPUs Exist
GPUs accelerate ML well but were designed for graphics. CPUs are general-purpose. NPUs are purpose-built for ML:
- Optimized for low-precision math (INT8, INT4, BF16)
- Massively parallel matrix operations
- Often 5-10× more efficient (perf/watt) than GPUs for ML workloads
- Critical for sustained ML on battery-powered devices
## Where They Show Up
| Vendor | NPU |
|---|---|
| Apple | Neural Engine (every iPhone since A11, every M-series Mac) |
| Qualcomm | Hexagon NPU (Snapdragon mobile + Copilot+ PCs) |
| Intel | NPU (Core Ultra "Meteor Lake" and newer) |
| AMD | XDNA / Ryzen AI |
| Google | Tensor TPU (Pixel phones) |
| Microsoft | Co-designed with Qualcomm/Intel for Copilot+ PCs |
| Huawei | Da Vinci NPU (Kirin chips) |
## Microsoft's Copilot+ PC Threshold
Microsoft set a 40 TOPS minimum for "Copilot+" certification — a marker for laptops capable of running on-device AI features fluidly. This drove NPU adoption across the PC industry in 2024-2025.
## Why NPUs Matter for Browser ML
[[WebNN API]] is designed to expose NPU acceleration to web apps. When a model runs through WebNN with an NPU execution provider, inference can be:
- 5-10× faster than CPU
- 5-10× more power-efficient than GPU
- Free up GPU for graphics
This is what makes [[Browser-Provided Language Models]] like [[Gemini Nano]] viable on consumer hardware.
## Programming Model
Developers rarely program NPUs directly. Access is through:
- OS-level frameworks (Apple Core ML, Windows DirectML, Android NNAPI)
- Runtimes ([[ONNX Runtime Web]], TensorFlow Lite, PyTorch Mobile)
- Browser APIs ([[WebNN API]])
The runtime translates the high-level model graph into NPU-specific kernels.
## Trade-offs
**Strengths:** efficiency, latency, battery life, privacy-by-locality
**Weaknesses:** programmability gap (vendor-specific), supports a subset of ops, often quantized-only
## References
- https://en.wikipedia.org/wiki/AI_accelerator
## Related
- [[Neural Networks (NNs)]]
- [[Machine Learning (ML)]]
- [[AI Inference]]
- [[WebNN API]]
- [[On-Device Machine Learning]]
- [[Edge AI]]
- [[Browser-Provided Language Models]]
- [[AI Quantization]]
- [[Apple Neural Engine]]
- [[Apple Intelligence]]
- [[Apple Core ML]]
- [[Windows Copilot Runtime]]