Support-Constrained RL Enables Real-World Policy Improvement without Real-World Experience

arXiv:2606.27475v1 Announce Type: new Abstract: Robots trained on real world data tend to be imprecise, slow, and brittle to perturbations. Improving these policies with reinforcement learning (RL) is an appealing alternative, but this process often requires expensive training in the real world. Performing policy improvement in simulation instead provides a far cheaper alternative, but unconstrained RL in simulation can exploit contact and dynamics mismatches, resulting in unsafe behaviors that ...

arXiv cs.RO ·Raymond Yu, William Huey, Mustafa Mukadam, Anusha Nagabandi, Abhishek Gupta ·
compartilhar: