Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models
Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization pa…
Hugging Face · Daily Papers
·Nikita Kachaev, Andrey Moskalenko
·
·▲ 41 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Nikita Kachaev, Andrey Moskalenko, Matvey Skripkin, Nikita Kurlaev, Daria Pugacheva, Albina Burlova
- 41 upvotes da comunidade
- Temas: Vision-Language-Action models, pretrained VLMs, robotics data, knowledge-sensitive tasks, action-grounded success rate, commonsense knowledge
Resumo
Resumo original (em inglês), extraído do paper:
Act2Answer protocol evaluates embodied vision-language-action models by having agents answer questions through physical actions, revealing knowledge retention and generalization patterns across different semantic categories.Onde ler
// relacionados
Leia também
Blog
Anthropic Redeploys Claude Fable 5 on July 1 After US Export Controls Lift, Adds New Cybersecurity Classifier
Blog
Cloudflare’s new policy pushes AI companies to pay for publishers’ content
Blog
After spooking Trump into safety testing, Anthropic AI models get global release
Blog