# On-Device Machine Learning
Running machine learning models locally on the user's device rather than sending data to a remote server for inference. Also called edge ML or client-side ML. Core value proposition of the [[WebMachineLearning]] initiative and the [[Prompt API]].
## Why It Matters
| Dimension | Cloud Inference | On-Device Inference |
|---|---|---|
| Privacy | Data sent to server | Data never leaves device |
| Latency | Round-trip network | Sub-millisecond local |
| Offline | Not available | Works without internet |
| Cost | Per-query API fees | Zero marginal cost |
| Throughput | Rate-limited by API | Limited by device hardware |
## Key Enablers
- **Hardware acceleration**: NPUs, GPUs, and specialized ML chips in modern devices
- **Model compression**: quantization, pruning, and distillation make large models fit on-device
- **Browser APIs**: [[WebNN API]], [[Prompt API]] give web apps access to device hardware
- **OS-level models**: browsers can surface OS-provided models (e.g., Apple's Core ML, Google's Gemini Nano on Android)
## Trade-offs
**Advantages:**
- Privacy by default — no data transmitted
- Works offline
- No API costs
- Low latency for real-time use cases
**Limitations:**
- Model capability bounded by device compute
- Large model downloads for first run
- Consistency varies across devices and hardware
- Smaller context windows than cloud models
## Web Platform Connection
[[WebMachineLearning]] standardizes browser access to on-device ML. [[WebNN API]] provides the low-level hardware interface; [[Prompt API]] and [[Writing Assistance APIs]] expose higher-level LLM capabilities.
## References
- https://github.com/webmachinelearning
- https://www.w3.org/groups/wg/webmachinelearning/
## Related
- [[Machine Learning (ML)]]
- [[AI Inference]]
- [[AI Privacy]]
- [[WebMachineLearning]]
- [[Prompt API]]
- [[WebNN API]]
- [[Browser-Provided Language Models]]
- [[Large Language Models (LLMs)]]
- [[Edge AI]]
- [[Edge Computing]]
- [[Neural Processing Unit (NPU)]]
- [[Gemini Nano]]
- [[Transformers.js]]
- [[ONNX Runtime Web]]
- [[Web Assembly (WASM)]]
- [[WebGPU]]