Integrating TurboQuant with vLLM: A Step-by-Step Guide

2026-03-25

TurboQuant integrates natively with vLLM's PagedAttention mechanism. Here's how to get started.

Installation

bash

pip install turboquant-vllm

Basic Usage

python

from vllm import LLM
from turboquant import TurboQuantConfig

config = TurboQuantConfig(bits=3, method='tq')
llm = LLM(
    model='meta-llama/Llama-3-70B',
    kv_cache_config=config,
    max_model_len=32768
)

Benchmark Results

With TurboQuant enabled, Llama 3 70B at 32K context requires only ~6GB of KV cache VRAM on H100, compared to 76GB with FP16 — a 12.7x reduction.

Installation

Basic Usage

Benchmark Results

Related Posts