Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding
arXiv:2606.27596v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) exhibit sophisticated reasoning but remain susceptible to object hallucination. Deviating from the prevailing attention intensity assumption, we reveal a deeper dynamic structural misalignment: hallucination is triggered at decision-critical steps where specific attention heads, acting as risky mediators, decouple from visual evidence to lock onto language priors. This establishes a pathological shortcut that by...
arXiv cs.CV
·Liu Yu, Can Chen, Ping Kuang, Zhikun Feng, Fan Zhou, Gillian Dobbie
·
// relacionados
Leia também
Blog
The US military used AI to pick thousands of targets but missed a note saying one was a school
Blog
HP accelerates enterprise workflows with OpenAI Frontier
Editorial
O fantasma do Fable 5: banido, o modelo vive nos datasets que o destilam
Editorial