# ML Deployment Patterns
ML deployment patterns are the canonical ways to put a trained model in front of consumers:
- **Online (REST/gRPC)** — synchronous, low-latency, one request per prediction.
- **Batch** — score a large dataset on a schedule; output to a table or file.
- **Streaming** — score events as they arrive on a queue/stream.
- **Edge / embedded** — model runs in the consumer process (browser, mobile, IoT).
- **Shadow / canary / A-B** — deployment strategies layered on top of the above to validate a new model against production traffic safely.
Pattern choice drives infrastructure: GPUs for online LLM serving vs. Spark for batch vs. ONNX runtime for edge.
## References
## Related
- [[MLflow]]
- [[Model Registry]]
- [[MLOps]]