NormGuard: Reward-Preserving Norm Constraints in Flow-Matching Reinforcement Learning
Reinforcement learning post-training degrades perceptual quality in flow-based generators through velocity norm inflation, which requires training-time intervention rather than inf…
Hugging Face · Daily Papers
·Tianlin Pan, Lianyu Pang
·
·▲ 3 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Tianlin Pan, Lianyu Pang, Cheng Da, Huan Yang, Changqian Yu, Kun Gai
- 3 upvotes da comunidade
- Temas: flow-based generators, reinforcement learning, reward alignment, velocity norm, norm inflation, classifier-free guidance
Resumo
Resumo original (em inglês), extraído do paper:
Reinforcement learning post-training degrades perceptual quality in flow-based generators through velocity norm inflation, which requires training-time intervention rather than inference-time corrections to maintain both reward alignment and image quality.Onde ler
// relacionados
Leia também
Blog
The US military used AI to pick thousands of targets but missed a note saying one was a school
Blog
HP accelerates enterprise workflows with OpenAI Frontier
Editorial
O fantasma do Fable 5: banido, o modelo vive nos datasets que o destilam
Editorial