Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning
A new benchmark evaluates multimodal large language models' ability to reason over dynamic visual evidence through controlled temporal-logical operations rather than simple object…
Hugging Face · Daily Papers
·Hohin Kwan, Hongyu Li
·
·▲ 23 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Hohin Kwan, Hongyu Li, Ray Zhang, Manyuan Zhang, Xianghao Kong, Anyi Rao
- 23 upvotes da comunidade
- Temas: multimodal large language models, video temporal-logical reasoning, temporal-logical operations, state tracking, sequential counting, temporal ordering
Resumo
Resumo original (em inglês), extraído do paper:
A new benchmark evaluates multimodal large language models' ability to reason over dynamic visual evidence through controlled temporal-logical operations rather than simple object recognition.Onde ler
// relacionados
Leia também
Modelo
nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16
Blog
OpenClaw is finally available on Android and iOS
Blog
Claude Science is Anthropic’s newest flagship product
Blog