Edge AI - DeveloPassion

# Edge AI The application of [[Machine Learning (ML)]] within an [[Edge Computing]] architecture. Models run on user devices, IoT gateways, or edge PoPs rather than on centralized cloud infrastructure. [[On-Device Machine Learning]] is a subset of Edge AI focused specifically on the user's own device. ## Spectrum of "Edge" | Tier | Where the Model Runs | Examples | |---|---|---| | On-device | Phone, laptop, watch, browser | [[Gemini Nano]] in Chrome, Apple Intelligence | | On-prem | Factory floor server, home hub | Industrial defect detection, smart home AI | | Edge PoP | Regional CDN node | Cloudflare Workers AI, Fastly AI | | Hybrid | Inference at edge, training in cloud | Most production systems | ## Why It's Growing - **Hardware**: NPUs ([[Neural Processing Unit (NPU)]]) ship in every recent phone and laptop - **Model efficiency**: [[Knowledge Distillation]] and [[AI Quantization]] make capable models small - **Privacy regulation**: GDPR and similar laws push toward local processing - **Cost**: cloud inference at scale is expensive; edge amortizes free capacity - **Standards**: [[WebMachineLearning]] makes edge AI accessible to web developers ## Key Enablers - **Compressed models**: distilled, quantized, pruned models that fit in 1-8 GB - **Hardware acceleration**: NPUs deliver order-of-magnitude better perf/watt than CPUs - **Standardized runtimes**: [[ONNX Runtime Web]], [[Transformers.js]], [[WebNN API]] - **Browser-provided models**: [[Browser-Provided Language Models]] eliminate developer setup ## Common Use Cases - On-device speech recognition (no audio leaves the device) - Real-time image processing (filters, AR, accessibility tools) - Personalized recommendations without sending behavior data - Offline assistants (commute, flights, low-connectivity environments) - Privacy-preserving health and fitness analytics ## Trade-offs vs Cloud AI **Edge AI strengths:** - Privacy: data never leaves the device - Latency: sub-millisecond instead of 100-500ms - Cost: zero per-query cost - Offline: works without internet **Edge AI limitations:** - Smaller models, smaller context, less capability - Hardware variability across devices - Update lag (model versions not synchronized) - Higher first-run cost (model download) ## Hybrid Patterns Most real systems combine edge and cloud: - Edge for triage and common cases; cloud for hard ones - On-device pre-processing; cloud aggregation - Cloud training on aggregated signals; edge inference on raw data ## References - https://github.com/webmachinelearning ## Related - [[Edge Computing]] - [[On-Device Machine Learning]] - [[Machine Learning (ML)]] - [[AI Inference]] - [[AI Privacy]] - [[Neural Processing Unit (NPU)]] - [[WebMachineLearning]] - [[Browser-Provided Language Models]] - [[Gemini Nano]] - [[Knowledge Distillation]] - [[AI Quantization]] - [[Small Language Models (SLMs)]]