ViQ: Text-Aligned Visual Quantized Representations at Any Resolution
ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native…
Hugging Face · Daily Papers
·Xumin Yu, Zuyan Liu
·
·▲ 37 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Xumin Yu, Zuyan Liu, Zhenyu Yang, Yuhao Dong, Shengsheng Qian, Jiwen Lu
- 37 upvotes da comunidade
- Temas: visual quantized representations, text-aligned pre-training, feature discretization, proximal representation learning, position-aware head-wise quantization, multimodal modeling
Resumo
Resumo original (em inglês), extraído do paper:
ViQ presents a visual quantization framework that balances semantic richness and detail preservation in discrete representations, enabling efficient multimodal training with native-resolution inputs.Onde ler
// relacionados