Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe
Paper LLMs & Texto

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Uniform 4-bit training with RHT-based quantization outperforms E2M1-based methods by eliminating shrinkage bias and improving training stability across large language model archite…

Hugging Face · Daily Papers ·Qian Zhao, Kunlong Chen · ·▲ 7 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Qian Zhao, Kunlong Chen, Changxin Tian, Zhonghui Jiang, Haitao Zhang, Chaofan Yu

  • 7 upvotes da comunidade
  • Temas: FP4 training, Shrinkage Bias, Random Hadamard Transform, uniform grids, E1M2, INT4

Resumo

Resumo original (em inglês), extraído do paper:

Uniform 4-bit training with RHT-based quantization outperforms E2M1-based methods by eliminating shrinkage bias and improving training stability across large language model architectures.

Ler o paper completo no Hugging Face →

compartilhar: