Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning

A new benchmark evaluates multimodal large language models' ability to reason over dynamic visual evidence through controlled temporal-logical operations rather than simple object…

Hugging Face · Daily Papers ·Hohin Kwan, Hongyu Li · ·▲ 23 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Hohin Kwan, Hongyu Li, Ray Zhang, Manyuan Zhang, Xianghao Kong, Anyi Rao

  • 23 upvotes da comunidade
  • Temas: multimodal large language models, video temporal-logical reasoning, temporal-logical operations, state tracking, sequential counting, temporal ordering

Resumo

Resumo original (em inglês), extraído do paper:

A new benchmark evaluates multimodal large language models' ability to reason over dynamic visual evidence through controlled temporal-logical operations rather than simple object recognition.

Onde ler

compartilhar: