Blog Multimodal LLMs & Texto

Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

arXiv:2606.28520v1 Announce Type: new Abstract: Large vision-language models (LVLMs) are increasingly used for clinical image understanding, yet they remain vulnerable to \emph{hallucinations}--producing textual findings or attributes not supported by the image. We present a vision-traceable hallucination detection framework that audits arbitrary LVLM responses via visual evidence grounding, requiring neither modification nor internal access to the hidden states of LVLMs. Given an LVLM response,...

arXiv cs.CV ·Xiao Song, Haonan Qin, Zhaoxu Zhang, Jiong Zhang, Yuqi Fang, Caifeng Shan · 30 de janeiro de 2026

Ver no Hugging Face

// relacionados

Detecting Clinical Hallucinations in LVLMs via Counterfactual Visual Grounding Uncertainty

Leia também

LocateAnything-3B: a NVIDIA ensina um modelo a apontar o dedo na imagem

InternScience/Agents-A1

NIVA: A Multimodal Foundation Model for Actionable Earth System Intelligence

Can AI Draw Science? A Benchmark for Evaluating Scientific Figure Generation by Text-to-Image and Multimodal Models