NormGuard: Reward-Preserving Norm Constraints in Flow-Matching Reinforcement Learning

NormGuard: Reward-Preserving Norm Constraints in Flow-Matching Reinforcement Learning

Reinforcement learning post-training degrades perceptual quality in flow-based generators through velocity norm inflation, which requires training-time intervention rather than inf…

Hugging Face · Daily Papers ·Tianlin Pan, Lianyu Pang · ·▲ 3 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Tianlin Pan, Lianyu Pang, Cheng Da, Huan Yang, Changqian Yu, Kun Gai

  • 3 upvotes da comunidade
  • Temas: flow-based generators, reinforcement learning, reward alignment, velocity norm, norm inflation, classifier-free guidance

Resumo

Resumo original (em inglês), extraído do paper:

Reinforcement learning post-training degrades perceptual quality in flow-based generators through velocity norm inflation, which requires training-time intervention rather than inference-time corrections to maintain both reward alignment and image quality.

Onde ler

compartilhar: