TurboQuant
  • Guide
  • Calculator
  • Benchmark
  • Visualizer
  • Blog
  • GitHub

Blog & Articles

Technical deep dives, tutorials, and analysis on KV cache quantization

Deep Dive8 min read

TurboQuant Explained: How Random Rotation Beats 20 Years of VQ

A deep dive into the mathematical intuition behind TurboQuant's data-oblivious approach and why it outperforms PQ, KIVI, and INT8.

2026-04-01
Tutorial5 min read

How Much VRAM Can You Save? A Practical Calculator

We built an interactive calculator that shows exactly how much memory you save with TurboQuant for popular models like Llama-3-70B.

2026-03-25
Theory12 min read

The Shannon Limit for Vector Quantization — And How to Reach It

Understanding the theoretical lower bound on quantization distortion and why TurboQuant gets within 1.45x of it.

2026-03-15
Implementation6 min read

Integrating TurboQuant with vLLM: A Step-by-Step Guide

How to enable KV cache quantization in your vLLM deployment with just a few lines of code.

2026-03-10
Analysis7 min read

Why Long-Context LLMs Need KV Cache Compression

As context windows grow to 128K+, the KV cache becomes the dominant memory bottleneck. Here's why quantization is the answer.

2026-03-01

This is a placeholder. Blog posts will be added based on community contributions.

TurboQuant
  • Guide
  • Calculator
  • Benchmark
  • Visualizer
  • Blog
  • GitHub
    © 2026 Google TurboQuant.
    • Privacy Policy
    • Terms Of Service