MAPL: Multi-Objective Preference Learning for Robot Locomotion

arXiv:2606.25398v1 Announce Type: new Abstract: Reward design remains a major bottleneck in reinforcement learning for robot locomotion, where successful policies often depend on carefully tuned, task-specific reward functions. Preference-based reinforcement learning offers an alternative, but existing LLM-based methods typically ask for a single overall judgment between behaviors, making it difficult to capture the multiple competing objectives that underlie high-quality locomotion. We present ...

arXiv cs.RO ·Xiyue Chen, Muhan Lin, Shuyang Shi, Joseph Campbell ·
compartilhar: