Reflective VLA: In-Context Action Consequences Make VLAs Generalize

arXiv:2606.25215v1 Announce Type: new Abstract: Most vision-language-action (VLA) models are reactive: they predict the next action from the current instruction and observation, implicitly assuming that the current observation fully specifies the action-relevant state. In embodied control, however, embodiment-specific factors such as camera-to-robot geometry, robot calibration, or systematic actuation bias are often hard to identify from a single observation. As a result, reactive policies canno...

arXiv cs.CV ·Qing Lian, Kent Yu, Lei Zhang ·
compartilhar: