ReFPO: Reflow Regularization for Flow Matching Policy Gradients

arXiv:2606.21086v1 Announce Type: new Abstract: We present Reflow-regularized Flow Matching Policy Gradients (ReFPO), a simple online RL method that adds explicit Reflow regularization to FPO for efficient flow-based control. We uncover a key structural property: the gradient updates in Flow Matching Policy Gradients (FPO) can be interpreted as an implicit advantage-weighted Reflow process, providing a new geometric perspective on flow-based policy gradients. Building on this insight, ReFPO intr...

arXiv cs.RO ·Ge Wang, Yibo Peng, Fan Feng, Shenhao Yan, Chengsi Yao, Jiahao Yang, Honghao Cai, Yiming Zhao, Xi Li, Jinke Ren, Shuguang Cui, Yatong Han, Zhen Li ·
compartilhar: