Direct Action-Head Injection of A Grounded 3D Point Unlocks Spatial and Task Generalization
arXiv:2606.27663v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models leverage large-scale vision-language pretraining for flexible robot manipulation, yet at test time they remain brittle along two axes: spatial generalization, when object positions differ from those seen during training, and task generalization, when a familiar scene is paired with a different language instruction than the one seen in training. A growing family of methods addresses this brittleness by endowing a ...
arXiv cs.RO
·Shiang-Feng Tsai, Jin-Cheng Jhang, Yen-Ling Tai, Jia-Hong Lai, Shih-Yun Wong, KangTung-Hsu, Yi-Ting Chen
·
// relacionados
Leia também
Blog
HP accelerates enterprise workflows with OpenAI Frontier
Blog
Open Models, Closed Environments: Palantir Brings Secure AI to US Agencies With NVIDIA Nemotron
Blog
Claude Code runs a GitHub repo's hidden malware without verification, giving attackers full control
Blog