Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

arXiv:2606.28520v1 Announce Type: new Abstract: Large vision-language models (LVLMs) are increasingly used for clinical image understanding, yet they remain vulnerable to \emph{hallucinations}--producing textual findings or attributes not supported by the image. We present a vision-traceable hallucination detection framework that audits arbitrary LVLM responses via visual evidence grounding, requiring neither modification nor internal access to the hidden states of LVLMs. Given an LVLM response,...

arXiv cs.CV ·Xiao Song, Haonan Qin, Zhaoxu Zhang, Jiong Zhang, Yuqi Fang, Caifeng Shan ·
compartilhar: