Blog Robótica & RL Multimodal

Supervise What Survives: Geometry-Guided VLA Adaptation from Synthetic Robot Videos

arXiv:2606.24448v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models require large-scale video-action pairs, yet real teleoperation remains scarce. While generated robot videos offer a scalable alternative, existing methods treat them as real robot data by recovering pseudo-actions from synthesized pixels. We argue that deriving low-level control from generated visuals is a mismatched abstraction. A video captures only \emph{geometry}: the spatial trajectory representing the \emph...

arXiv cs.RO ·Danze Chen, Yanzhe Chen, Qiming Huang, Zhijun Cao, Chen Gao, Mike Zheng Shou · 24 de janeiro de 2026

Ver no Hugging Face

// relacionados

Supervise What Survives: Geometry-Guided VLA Adaptation from Synthetic Robot Videos

Leia também

Former Infosys chief has a new startup that wants to challenge the IT services world

Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost

Agility Robotics plans to go public via SPAC in a $2.5B deal

3 days left to save up to $190 on your TechCrunch Founder Summit 2026 pass