Reflective VLA: In-Context Action Consequences Make VLAs Generalize
arXiv:2606.25215v1 Announce Type: new Abstract: Most vision-language-action (VLA) models are reactive: they predict the next action from the current instruction and observation, implicitly assuming that the current observation fully specifies the action-relevant state. In embodied control, however, embodiment-specific factors such as camera-to-robot geometry, robot calibration, or systematic actuation bias are often hard to identify from a single observation. As a result, reactive policies canno...
arXiv cs.CV
·Qing Lian, Kent Yu, Lei Zhang
·