On-Device Machine Learning

# On-Device Machine Learning Running machine learning models locally on the user's device rather than sending data to a remote server for inference. Also called edge ML or client-side ML. Core value proposition of the [[WebMachineLearning]] initiative and the [[Prompt API]]. ## Why It Matters | Dimension | Cloud Inference | On-Device Inference | |---|---|---| | Privacy | Data sent to server | Data never leaves device | | Latency | Round-trip network | Sub-millisecond local | | Offline | Not available | Works without internet | | Cost | Per-query API fees | Zero marginal cost | | Throughput | Rate-limited by API | Limited by device hardware | ## Key Enablers - **Hardware acceleration**: NPUs, GPUs, and specialized ML chips in modern devices - **Model compression**: quantization, pruning, and distillation make large models fit on-device - **Browser APIs**: [[WebNN API]], [[Prompt API]] give web apps access to device hardware - **OS-level models**: browsers can surface OS-provided models (e.g., Apple's Core ML, Google's Gemini Nano on Android) ## Trade-offs **Advantages:** - Privacy by default — no data transmitted - Works offline - No API costs - Low latency for real-time use cases **Limitations:** - Model capability bounded by device compute - Large model downloads for first run - Consistency varies across devices and hardware - Smaller context windows than cloud models ## Web Platform Connection [[WebMachineLearning]] standardizes browser access to on-device ML. [[WebNN API]] provides the low-level hardware interface; [[Prompt API]] and [[Writing Assistance APIs]] expose higher-level LLM capabilities. ## References - https://github.com/webmachinelearning - https://www.w3.org/groups/wg/webmachinelearning/ ## Related - [[Machine Learning (ML)]] - [[AI Inference]] - [[AI Privacy]] - [[WebMachineLearning]] - [[Prompt API]] - [[WebNN API]] - [[Browser-Provided Language Models]] - [[Large Language Models (LLMs)]] - [[Edge AI]] - [[Edge Computing]] - [[Neural Processing Unit (NPU)]] - [[Gemini Nano]] - [[Transformers.js]] - [[ONNX Runtime Web]] - [[Web Assembly (WASM)]] - [[WebGPU]]