Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models

Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization pa…

Hugging Face · Daily Papers ·Nikita Kachaev, Andrey Moskalenko · ·▲ 41 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova

  • 41 upvotes da comunidade
  • Temas: Vision-Language-Action models, pretrained VLMs, robotics data, knowledge-sensitive tasks, action-grounded success rate, commonsense knowledge

Resumo

Resumo original (em inglês), extraído do paper:

Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization patterns across different semantic categories.

Onde ler

compartilhar: