Bridge-WA: Predicting Where and How the World Changes for Robotic Action

arXiv:2607.02195v1 Announce Type: new Abstract: General-purpose vision-language-action models benefit from large vision-language priors, but effective manipulation also requires anticipating action-relevant scene changes. Existing world-action models often rely on large generative world models or dense future rollouts, which are expensive and spend capacity on visual details weakly coupled to control. We present Bridge-WA, a lightweight world-action framework that distills a frozen future-change...

arXiv cs.RO ·Yongjie Bai, Hanting Wang, Mingtong Dai, Qijun Zhong, Yang Liu, Liang Lin ·
compartilhar: