Unleashing More Actions via Action Compositional Training for VLA Models

arXiv:2607.00351v1 Announce Type: new Abstract: Vision-Language-Action models excel at robotic manipulation, driven by the scale and diversity of demonstration data. However, standard training paradigms often cause VLA models to severely overfit to specific behavioral patterns, rendering them unable to generalize to out-of-distribution scenarios even when those scenarios merely require novel combinations of identical sub-skills. While expanding datasets can mitigate this overfitting, acquiring h...

arXiv cs.RO ·Kai Peng, Jie Lu, Xiaojiang Peng ·
compartilhar: