Benchmark Compare

Side-by-side comparison of KV cache quantization methods

AlgorithmBitsPPL (↓)MemorySpeedOnlineNo Training
TurboQuant3-bit~baseline~4x~2x
TurboQuant2.5-bit+0.1~6x~3x
GPTQ4-bit+0.3~4x~1.5x
AWQ4-bit+0.2~4x~1.5x
KVQuant4-bit+0.15~4x~1x
FP88-bit+0.05~2x~1.2x

Sources: arXiv:2504.19874 (Google Research) • 0xSero/turboquant implementation • Public benchmarks

Algorithm Comparison Radar

TurboQuant consistently outperforms GPTQ and KIVI across all evaluated configurations. At 3.5-bit, TurboQuant achieves near-lossless quality (PPL within 1% of FP16) while reducing KV cache memory by over 80%. At 3-bit, the 8x attention speedup on H100 GPUs comes from reduced HBM bandwidth pressure — the primary bottleneck in long-context inference. Unlike GPTQ which requires offline calibration data, TurboQuant operates online with zero training overhead.

© 2026 TurboQuant Guide — Community resource. Not affiliated with Google LLC.