Blog Multimodal Dados & Embeddings

A Benchmark for Hallucination Detection in VLMs for Gastrointestinal Endoscopy

arXiv:2606.24115v1 Announce Type: new Abstract: Vision-language models (VLMs) are prone to hallucination, which remains a major barrier to their safe deployment in clinical practice. To date, most hallucination detection methods have been evaluated on radiology benchmarks such as MIMIC-CXR and VQA-RAD, while gastrointestinal (GI) endoscopy remains largely underexplored. In this paper, we benchmark nine hallucination detection methods on the Gut-VLM dataset, a GI diagnostic Visual Question Answer...

arXiv cs.CV ·Aminu Lawal, Niyoj Oli, Sachin Acharya, Prashnna Gyawali, Maria Carmen Romano, Binod Bhattarai · 24 de janeiro de 2026

Ver no Hugging Face

// relacionados

A Benchmark for Hallucination Detection in VLMs for Gastrointestinal Endoscopy

Leia também

Cosmos 3: o primeiro modelo aberto que vê, simula e age no mundo físico

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy

VisChronos: Revolutionizing Image Captioning Through Real-Life Events