TAP-VLA: Tactile Annotation Prompting for Vision Language Action Models

arXiv:2606.29089v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models demonstrate impressive reasoning over visual, semantic, and spatial task variations by leveraging large-scale vision and language pre-training. They remain, however, largely blind to contact forces, which seldom manifest clearly in visual feedback but are central to contact-rich manipulation. Tactile sensing measures these forces directly, but integrating it into VLAs is difficult: tactile data is absent from the...

arXiv cs.RO ·Mark Van der Merwe, Mohamad Louai Shehab, Jayjun Lee, Youngsun Wi, Yinpei Dai, Dmitry Berenson, Nima Fazeli ·
compartilhar: