Paper LLMs & Texto Dados & Embeddings

RoPE-Aware Bit Allocation for KV-Cache Quantization

Block-GTQ introduces a RoPE-aware bit allocation method for key-cache quantization that improves attention accuracy and downstream performance through adaptive bit distribution and…

Hugging Face · Daily Papers ·Fengfeng Liang, Yuechen Zhang · 23 de janeiro de 2026 ·▲ 4 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Fengfeng Liang, Yuechen Zhang, Jiaya Jia

4 upvotes da comunidade
Temas: RoPE, KV-cache quantization, bit-allocation, TurboQuant-MSE, TQ-MSE, attention logit

Resumo

Resumo original (em inglês), extraído do paper:

Block-GTQ introduces a RoPE-aware bit allocation method for key-cache quantization that improves attention accuracy and downstream performance through adaptive bit distribution and packed cache serving.

Ler o paper completo no Hugging Face →

Ver no Hugging Face

// relacionados

RoPE-Aware Bit Allocation for KV-Cache Quantization

Resumo

Leia também

Amazon ups India bet with fresh $13B AI infrastructure investment

Jalapeño: a OpenAI projeta seu primeiro chip de inferência — e usou IA para fazer isso em 9 meses

SkillOpt: como ensinar agentes de IA a melhorar suas próprias habilidades — +23 pontos em GPT-5.5

Authors Guild test finds some AI detectors perfectly identify human writing while others fail on every single text