Benchmark Comparison

Performance comparison across quantization methods

FP16INT8GPTQKIVITurboQuant
ModelContextMethodBitsVRAMPPLSpeed
Llama-3-70B8KFP1616100%3.121.0x
Llama-3-70B8KGPTQ-4bit425%3.892.1x
Llama-3-70B8KKIVI318.75%3.453.2x
Llama-3-70B8KTurboQuant 3.5-bit3.518.75%3.157.8x
Llama-3-70B8KTurboQuant 3-bit318.75%3.218.0x
Llama-3-8B32KFP1616100%5.211.0x
Llama-3-8B32KINT8850%5.342.8x
Llama-3-8B32KTurboQuant 3-bit318.75%5.297.5x

Sources: arXiv:2504.19874 (Google Research) • 0xSero/turboquant implementation • Public benchmarks

6x

VRAM reduction at 3-bit

8x

Attention speedup on H100

<1%

PPL degradation at 3.5-bit