S$^2$-VLA: State-Space Guided Vision-Language-Action Models for Long-Horizon Manipulation
arXiv:2606.27872v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, but their performance degrades significantly in long-horizon tasks due to cumulative error propagation. This limitation largely arises from static feature fusion mechanisms that rely on fixed weights to combine visual, language, and action representations, preventing the model from adapting to different phases of task execution. To address this limita...
arXiv cs.RO
·Zhipeng Xie, Zongyi Han, Xiangyi Wei, Shiliang Sun, Yang Li, Jing Zhao
·
// relacionados
Leia também
Blog
HP accelerates enterprise workflows with OpenAI Frontier
Blog
Open Models, Closed Environments: Palantir Brings Secure AI to US Agencies With NVIDIA Nemotron
Blog
Claude Code runs a GitHub repo's hidden malware without verification, giving attackers full control
Blog