ONNX Runtime Web - DeveloPassion

# ONNX Runtime Web The web/JavaScript port of Microsoft's ONNX Runtime — a high-performance inference engine for [[ONNX]] models that runs in browsers and Node.js. Provides multiple execution backends (WASM, WebGPU, WebNN) selected automatically based on hardware availability. GitHub: https://github.com/microsoft/onnxruntime ## Execution Providers ONNX Runtime Web abstracts hardware via "execution providers" (EPs). It picks the fastest available: | EP | When to Use | Performance | |---|---|---| | [[Web Assembly (WASM)]] | Always available; fallback | Good (CPU) | | WASM + SIMD | Modern browsers | Better | | WASM + threads | Cross-origin isolated contexts | Even better | | [[WebGPU]] | Chrome, Edge | High (GPU) | | [[WebNN API]] | Browsers with WebNN | Highest (NPU/GPU/CPU optimized) | ## Why It Matters Powers most browser-based ML frameworks: - [[Transformers.js]] is built on top of ONNX Runtime Web - Used by image-processing libraries, speech models, embedding generators - The de facto runtime when you need to run a PyTorch/TF-trained model in a browser ## Workflow ``` PyTorch / TF / scikit-learn model ↓ export ONNX file (.onnx) ↓ load in browser ONNX Runtime Web (WASM/WebGPU/WebNN) ↓ inference Predictions ``` ## Quantization Support Loads quantized ONNX models (INT8, INT4) for major size and speed improvements. Critical for fitting LLMs and vision models in browser memory budgets. See [[AI Quantization]]. ## References - https://github.com/microsoft/onnxruntime - https://onnxruntime.ai/docs/tutorials/web/ ## Related - [[ONNX]] - [[Transformers.js]] - [[Web Assembly (WASM)]] - [[WebGPU]] - [[WebNN API]] - [[AI Inference]] - [[AI Quantization]] - [[On-Device Machine Learning]] - [[WebMachineLearning]]