Benchmark Comparison

Performance comparison across quantization methods

■ FP16■ INT8■ GPTQ■ KIVI■ TurboQuant

Model	Context	Method	Bits	VRAM	PPL	Speed
Llama-3-70B	8K	FP16	16	100%	3.12	1.0x
Llama-3-70B	8K	GPTQ-4bit	4	25%	3.89	2.1x
Llama-3-70B	8K	KIVI	3	18.75%	3.45	3.2x
Llama-3-70B	8K	TurboQuant 3.5-bit	3.5	18.75%	3.15	7.8x
Llama-3-70B	8K	TurboQuant 3-bit	3	18.75%	3.21	8.0x
Llama-3-8B	32K	FP16	16	100%	5.21	1.0x
Llama-3-8B	32K	INT8	8	50%	5.34	2.8x
Llama-3-8B	32K	TurboQuant 3-bit	3	18.75%	5.29	7.5x

Sources: arXiv:2504.19874 (Google Research) • 0xSero/turboquant implementation • Public benchmarks

6x

VRAM reduction at 3-bit

8x

Attention speedup on H100

<1%

PPL degradation at 3.5-bit

TurboQuant