Blog
LLMs & Texto
Quantization Inflates Reasoning: Token Inflation as a Hidden Cost of Low-Bit Reasoning Models
arXiv:2606.25519v1 Announce Type: new Abstract: Quantization is widely used to reduce the inference cost of large language models, but its effect on reasoning models is not fully captured by final-answer accuracy or per-token latency. We show that low-bit post-training quantization can introduce a hidden test-time compute cost: quantized reasoning models often generate longer chains of thought even when they still answer correctly. Across mathematical reasoning, code generation, scientific quest...
arXiv cs.AI
·Xinyu Lian, Walid Krichene, Beichen Huang, Masahiro Tanaka, Olatunji Ruwase, Li Zhang, Minjia Zhang
·
// relacionados
Leia também
Blog
Amazon ups India bet with fresh $13B AI infrastructure investment
Editorial
Jalapeño: a OpenAI projeta seu primeiro chip de inferência — e usou IA para fazer isso em 9 meses
Editorial
SkillOpt: como ensinar agentes de IA a melhorar suas próprias habilidades — +23 pontos em GPT-5.5
Blog