G$^3$VLA: Geometric inductive bias for Vision-Language-Action Models
arXiv:2606.24472v1 Announce Type: new Abstract: Vision-language-action (VLA) models have made rapid progress in generalist robot manipulation by harnessing semantic knowledge from pretrained vision-language backbones, but their visual tokens remain grounded in 2D image coordinates rather than the calibrated geometry of the robot's cameras -- a mismatch especially pronounced in multi-camera setups, where views are coupled by known intrinsics and extrinsics yet processed as independent images. We ...
arXiv cs.RO
·Yue Peng, Yongzhe Zhao, Artur Habuda, Khuyen Pham, Yanheng Zhu, Tran Nguyen Le, Fares Abu-Dakka, Li Guo
·
// relacionados
Leia também
Blog
Former Infosys chief has a new startup that wants to challenge the IT services world
Blog
Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost
Blog
Agility Robotics plans to go public via SPAC in a $2.5B deal
Blog