# TurboQuant
**TurboQuant is a vector quantization algorithm from Google Research that compresses embeddings to 2–4 bits per coordinate while keeping recall close to uncompressed search.** It's the engine behind [[Turbovec]].
The idea: most vector indexes waste memory storing 32-bit floats when far fewer bits carry the signal that matters for nearest-neighbour ranking. TurboQuant gets the bit count down without the usual recall collapse by making the data well-behaved before it quantizes.
## The recipe
1. **Normalize** each vector to a unit direction.
2. **Random orthogonal rotation** so the coordinates follow predictable Beta distributions; no single coordinate hoards the variance.
3. **Per-coordinate calibration**, mapping empirical quantiles onto canonical distributions.
4. **Lloyd-Max scalar quantization** with precomputed optimal buckets (4 buckets for 2-bit, 16 for 4-bit).
5. **Length-renormalized scoring** at query time to correct the bias quantization introduces.
The payoff: roughly **16x compression** for 1536-dimensional vectors, recall within ~2 points of [[FAISS]] IndexPQ, and scoring that stays fast because quantized codes are SIMD-friendly.
## Related
- [[Turbovec]]
- [[Embeddings]]
- [[FAISS]]
- [[Semantic Search]]
- [[Vector Store]]