ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs

arXiv:2606.31054v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) are critically hampered by hallucination, generating content inconsistent with the provided image. In this paper, we identify an internal signature of hallucination: progressive degradation of text-to-image cross-attention during generation, leading to specific failure patterns like unfocused or biased attention. Existing mitigation strategies are largely outcome-driven and do not explicitly target this fail...

arXiv cs.CV ·Zhiyuan Yao, Zheren Fu, Zhixiao Zheng, Jiajun Li, Yi Tu, Zhendong Mao ·
compartilhar: