# Fireworks AI **Fireworks AI** is a generative AI inference platform built by ex-PyTorch engineers. It hosts open-source LLMs, image, and audio models behind OpenAI-compatible APIs, with a custom inference engine optimized for low latency and high throughput. ## Why it matters Three reasons Fireworks shows up in serious AI stacks: 1. **Speed**; a custom inference engine (forked patterns from the original PyTorch team) consistently beats naive serving on throughput and tail latency. 2. **Open model coverage**; DeepSeek V3 and R1, Kimi K2, MiniMax M2, GLM, [[Qwen]], Gemma, FLUX.1 (image), [[Whisper]] (audio), plus more. 3. **Enterprise-grade privacy**; SOC 2, HIPAA, GDPR options, and **zero-data-retention** mode — meaning prompts and completions are not logged or used for training. ## Core capabilities - **Build**: serverless deployment of open-source models, no GPU setup, no cold starts - **Tune**: fine-tuning with reinforcement learning and quantization-aware tuning - **Scale**: automatic infrastructure provisioning across deployment types ## Privacy and compliance - SOC 2 certified - HIPAA available - GDPR options - Zero-data-retention available (key differentiator vs default OpenAI/Anthropic free tiers) - No training on customer data ## Pricing model Per-token pricing varies widely by model — from ~$0.30 per million input tokens on smaller models to per-image pricing on FLUX.1 (~$0.04/image). Generally cheaper than direct OpenAI/Anthropic API access for comparable-capability open models. ## Use cases Code assistance, conversational AI, agentic systems, semantic search, multimodal workflows, enterprise RAG. ## My take For workloads that can use open-source models, Fireworks is one of the cleanest privacy-respecting hosted-inference options. The zero-data-retention default makes it usable for sensitive prompts where sending the same prompt to [[OpenAI]] or [[Anthropic]] free tiers would be risky. It sits in the middle of the inference-provider landscape — faster than self-hosting, more private than the big frontier labs, broader model selection than most. The tradeoff is you're betting on open-source models being good enough, which has gotten increasingly true throughout 2025-2026. ## References - https://fireworks.ai ## Related - [[AI Privacy]] - [[OpenAI]] - [[Anthropic]] - [[Qwen]]