# ML Deployment Patterns ML deployment patterns are the canonical ways to put a trained model in front of consumers: - **Online (REST/gRPC)** — synchronous, low-latency, one request per prediction. - **Batch** — score a large dataset on a schedule; output to a table or file. - **Streaming** — score events as they arrive on a queue/stream. - **Edge / embedded** — model runs in the consumer process (browser, mobile, IoT). - **Shadow / canary / A-B** — deployment strategies layered on top of the above to validate a new model against production traffic safely. Pattern choice drives infrastructure: GPUs for online LLM serving vs. Spark for batch vs. ONNX runtime for edge. ## References ## Related - [[MLflow]] - [[Model Registry]] - [[MLOps]]