Supervise What Survives: Geometry-Guided VLA Adaptation from Synthetic Robot Videos
arXiv:2606.24448v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models require large-scale video-action pairs, yet real teleoperation remains scarce. While generated robot videos offer a scalable alternative, existing methods treat them as real robot data by recovering pseudo-actions from synthesized pixels. We argue that deriving low-level control from generated visuals is a mismatched abstraction. A video captures only \emph{geometry}: the spatial trajectory representing the \emph...
arXiv cs.RO
·Danze Chen, Yanzhe Chen, Qiming Huang, Zhijun Cao, Chen Gao, Mike Zheng Shou
·
// relacionados
Leia também
Blog
Former Infosys chief has a new startup that wants to challenge the IT services world
Blog
Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost
Blog
Agility Robotics plans to go public via SPAC in a $2.5B deal
Blog