TurboQuant - DeveloPassion

# TurboQuant **TurboQuant is a vector quantization algorithm from Google Research that compresses embeddings to 2–4 bits per coordinate while keeping recall close to uncompressed search.** It's the engine behind [[Turbovec]]. The idea: most vector indexes waste memory storing 32-bit floats when far fewer bits carry the signal that matters for nearest-neighbour ranking. TurboQuant gets the bit count down without the usual recall collapse by making the data well-behaved before it quantizes. ## The recipe 1. **Normalize** each vector to a unit direction. 2. **Random orthogonal rotation** so the coordinates follow predictable Beta distributions; no single coordinate hoards the variance. 3. **Per-coordinate calibration**, mapping empirical quantiles onto canonical distributions. 4. **Lloyd-Max scalar quantization** with precomputed optimal buckets (4 buckets for 2-bit, 16 for 4-bit). 5. **Length-renormalized scoring** at query time to correct the bias quantization introduces. The payoff: roughly **16x compression** for 1536-dimensional vectors, recall within ~2 points of [[FAISS]] IndexPQ, and scoring that stays fast because quantized codes are SIMD-friendly. ## Related - [[Turbovec]] - [[Embeddings]] - [[FAISS]] - [[Semantic Search]] - [[Vector Store]]