Blog LLMs & Texto Multimodal

Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

arXiv:2606.27596v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) exhibit sophisticated reasoning but remain susceptible to object hallucination. Deviating from the prevailing attention intensity assumption, we reveal a deeper dynamic structural misalignment: hallucination is triggered at decision-critical steps where specific attention heads, acting as risky mediators, decouple from visual evidence to lock onto language priors. This establishes a pathological shortcut that by...

arXiv cs.CV ·Liu Yu, Can Chen, Ping Kuang, Zhikun Feng, Fan Zhou, Gillian Dobbie · 29 de janeiro de 2026

Ver no Hugging Face

// relacionados

Dismantling Pathological Shortcuts: A Causal Framework for Faithful LVLM Decoding

Leia também

The US military used AI to pick thousands of targets but missed a note saying one was a school

HP accelerates enterprise workflows with OpenAI Frontier

O fantasma do Fable 5: banido, o modelo vive nos datasets que o destilam

MultiHashFormer: e se cada palavra fosse uma impressão digital?