ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native…

Hugging Face · Daily Papers ·Xumin Yu, Zuyan Liu · ·▲ 37 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Xumin Yu, Zuyan Liu, Zhenyu Yang, Yuhao Dong, Shengsheng Qian, Jiwen Lu

  • 37 upvotes da comunidade
  • Temas: visual quantized representations, text-aligned pre-training, feature discretization, proximal representation learning, position-aware head-wise quantization, multimodal modeling

Resumo

Resumo original (em inglês), extraído do paper:

ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native-resolution inputs.

Onde ler

compartilhar: