# Fireworks AI
**Fireworks AI** is a generative AI inference platform built by ex-PyTorch engineers. It hosts open-source LLMs, image, and audio models behind OpenAI-compatible APIs, with a custom inference engine optimized for low latency and high throughput.
## Why it matters
Three reasons Fireworks shows up in serious AI stacks:
1. **Speed**; a custom inference engine (forked patterns from the original PyTorch team) consistently beats naive serving on throughput and tail latency.
2. **Open model coverage**; DeepSeek V3 and R1, Kimi K2, MiniMax M2, GLM, [[Qwen]], Gemma, FLUX.1 (image), [[Whisper]] (audio), plus more.
3. **Enterprise-grade privacy**; SOC 2, HIPAA, GDPR options, and **zero-data-retention** mode — meaning prompts and completions are not logged or used for training.
## Core capabilities
- **Build**: serverless deployment of open-source models, no GPU setup, no cold starts
- **Tune**: fine-tuning with reinforcement learning and quantization-aware tuning
- **Scale**: automatic infrastructure provisioning across deployment types
## Privacy and compliance
- SOC 2 certified
- HIPAA available
- GDPR options
- Zero-data-retention available (key differentiator vs default OpenAI/Anthropic free tiers)
- No training on customer data
## Pricing model
Per-token pricing varies widely by model — from ~$0.30 per million input tokens on smaller models to per-image pricing on FLUX.1 (~$0.04/image). Generally cheaper than direct OpenAI/Anthropic API access for comparable-capability open models.
## Use cases
Code assistance, conversational AI, agentic systems, semantic search, multimodal workflows, enterprise RAG.
## My take
For workloads that can use open-source models, Fireworks is one of the cleanest privacy-respecting hosted-inference options. The zero-data-retention default makes it usable for sensitive prompts where sending the same prompt to [[OpenAI]] or [[Anthropic]] free tiers would be risky.
It sits in the middle of the inference-provider landscape — faster than self-hosting, more private than the big frontier labs, broader model selection than most. The tradeoff is you're betting on open-source models being good enough, which has gotten increasingly true throughout 2025-2026.
## References
- https://fireworks.ai
## Related
- [[AI Privacy]]
- [[OpenAI]]
- [[Anthropic]]
- [[Qwen]]