S$^2$-VLA: State-Space Guided Vision-Language-Action Models for Long-Horizon Manipulation

arXiv:2606.27872v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, but their performance degrades significantly in long-horizon tasks due to cumulative error propagation. This limitation largely arises from static feature fusion mechanisms that rely on fixed weights to combine visual, language, and action representations, preventing the model from adapting to different phases of task execution. To address this limita...

arXiv cs.RO ·Zhipeng Xie, Zongyi Han, Xiangyi Wei, Shiliang Sun, Yang Li, Jing Zhao ·
compartilhar: