# Edge AI
The application of [[Machine Learning (ML)]] within an [[Edge Computing]] architecture. Models run on user devices, IoT gateways, or edge PoPs rather than on centralized cloud infrastructure. [[On-Device Machine Learning]] is a subset of Edge AI focused specifically on the user's own device.
## Spectrum of "Edge"
| Tier | Where the Model Runs | Examples |
|---|---|---|
| On-device | Phone, laptop, watch, browser | [[Gemini Nano]] in Chrome, Apple Intelligence |
| On-prem | Factory floor server, home hub | Industrial defect detection, smart home AI |
| Edge PoP | Regional CDN node | Cloudflare Workers AI, Fastly AI |
| Hybrid | Inference at edge, training in cloud | Most production systems |
## Why It's Growing
- **Hardware**: NPUs ([[Neural Processing Unit (NPU)]]) ship in every recent phone and laptop
- **Model efficiency**: [[Knowledge Distillation]] and [[AI Quantization]] make capable models small
- **Privacy regulation**: GDPR and similar laws push toward local processing
- **Cost**: cloud inference at scale is expensive; edge amortizes free capacity
- **Standards**: [[WebMachineLearning]] makes edge AI accessible to web developers
## Key Enablers
- **Compressed models**: distilled, quantized, pruned models that fit in 1-8 GB
- **Hardware acceleration**: NPUs deliver order-of-magnitude better perf/watt than CPUs
- **Standardized runtimes**: [[ONNX Runtime Web]], [[Transformers.js]], [[WebNN API]]
- **Browser-provided models**: [[Browser-Provided Language Models]] eliminate developer setup
## Common Use Cases
- On-device speech recognition (no audio leaves the device)
- Real-time image processing (filters, AR, accessibility tools)
- Personalized recommendations without sending behavior data
- Offline assistants (commute, flights, low-connectivity environments)
- Privacy-preserving health and fitness analytics
## Trade-offs vs Cloud AI
**Edge AI strengths:**
- Privacy: data never leaves the device
- Latency: sub-millisecond instead of 100-500ms
- Cost: zero per-query cost
- Offline: works without internet
**Edge AI limitations:**
- Smaller models, smaller context, less capability
- Hardware variability across devices
- Update lag (model versions not synchronized)
- Higher first-run cost (model download)
## Hybrid Patterns
Most real systems combine edge and cloud:
- Edge for triage and common cases; cloud for hard ones
- On-device pre-processing; cloud aggregation
- Cloud training on aggregated signals; edge inference on raw data
## References
- https://github.com/webmachinelearning
## Related
- [[Edge Computing]]
- [[On-Device Machine Learning]]
- [[Machine Learning (ML)]]
- [[AI Inference]]
- [[AI Privacy]]
- [[Neural Processing Unit (NPU)]]
- [[WebMachineLearning]]
- [[Browser-Provided Language Models]]
- [[Gemini Nano]]
- [[Knowledge Distillation]]
- [[AI Quantization]]
- [[Small Language Models (SLMs)]]