MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

arXiv:2606.24155v1 Announce Type: new Abstract: Existing medical AI benchmarks lack process visibility, atomic skill evaluation, and integrated hallucination detection. We introduce MedBench v5, a redesigned benchmark for clinical multimodal models (language, vision-language, and agent systems) that moves from static QA to dynamic, process-oriented evaluation. MedBench v5 features: (1) a dual-dimensional framework combining Clinical Cognitive Responsiveness (14 sub-dimensions) and Medical Atomic...

arXiv cs.CL ·Ding Jinru, Jiang Chuchu, Lu Lu, Pang Wenrao, Bian Mouxiao, Gao Zhuangzhi, Chen Jiangyuan, Peng xinwei, Chen Ruiyao, Ren Sijie, Lu Renjie, Han Bin, Liu Meiling, and Xu Jie ·
compartilhar: