# ONNX Runtime Web
The web/JavaScript port of Microsoft's ONNX Runtime — a high-performance inference engine for [[ONNX]] models that runs in browsers and Node.js. Provides multiple execution backends (WASM, WebGPU, WebNN) selected automatically based on hardware availability.
GitHub: https://github.com/microsoft/onnxruntime
## Execution Providers
ONNX Runtime Web abstracts hardware via "execution providers" (EPs). It picks the fastest available:
| EP | When to Use | Performance |
|---|---|---|
| [[Web Assembly (WASM)]] | Always available; fallback | Good (CPU) |
| WASM + SIMD | Modern browsers | Better |
| WASM + threads | Cross-origin isolated contexts | Even better |
| [[WebGPU]] | Chrome, Edge | High (GPU) |
| [[WebNN API]] | Browsers with WebNN | Highest (NPU/GPU/CPU optimized) |
## Why It Matters
Powers most browser-based ML frameworks:
- [[Transformers.js]] is built on top of ONNX Runtime Web
- Used by image-processing libraries, speech models, embedding generators
- The de facto runtime when you need to run a PyTorch/TF-trained model in a browser
## Workflow
```
PyTorch / TF / scikit-learn model
↓ export
ONNX file (.onnx)
↓ load in browser
ONNX Runtime Web (WASM/WebGPU/WebNN)
↓ inference
Predictions
```
## Quantization Support
Loads quantized ONNX models (INT8, INT4) for major size and speed improvements. Critical for fitting LLMs and vision models in browser memory budgets. See [[AI Quantization]].
## References
- https://github.com/microsoft/onnxruntime
- https://onnxruntime.ai/docs/tutorials/web/
## Related
- [[ONNX]]
- [[Transformers.js]]
- [[Web Assembly (WASM)]]
- [[WebGPU]]
- [[WebNN API]]
- [[AI Inference]]
- [[AI Quantization]]
- [[On-Device Machine Learning]]
- [[WebMachineLearning]]