Blog & Articles

Technical deep dives, tutorials, and analysis on KV cache quantization

TurboQuant Explained: How Random Rotation Beats 20 Years of VQ

A deep dive into the mathematical intuition behind TurboQuant's data-oblivious approach and why it outperforms PQ, KIVI, and INT8.

We built an interactive calculator that shows exactly how much memory you save with TurboQuant for popular models like Llama-3-70B.

Understanding the theoretical lower bound on quantization distortion and why TurboQuant gets within 1.45x of it.

How to enable KV cache quantization in your vLLM deployment with just a few lines of code.

As context windows grow to 128K+, the KV cache becomes the dominant memory bottleneck. Here's why quantization is the answer.

This is a placeholder. Blog posts will be added based on community contributions.

TurboQuant