Qwen-Image-2.0-RL Technical Report
arXiv:2606.27608v1 Announce Type: new Abstract: We present Qwen-Image-2.0-RL, a post-training pipeline that applies reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to improve both the visual quality and instruction-following capability of the Qwen-Image-2.0 diffusion model. To provide reliable reward signals, we construct task-specific composite reward models by fine-tuning vision-language models with a pointwise scoring paradigm and chain-of-thought reasoning....
arXiv cs.CV
·Yixian Xu, Kaiyuan Gao, Yuxiang Chen, Yilei Chen, Zecheng Tang, Zihao Liu, Zikai Zhou, Deqing Li, Hao Meng, Kuan Cao, Jiahao Li, Jie Zhang, Liang Peng, Lihan Jiang, Ningyuan Tang, Shengming Yin, Tianhe Wu, Xiaoyue Chen, Yan Shu, Yanran Zhang, Yi Wang, Yu Wu, Yujia Wu, Zekai Zhang, Zhendong Wang, Xiao Xu, Kun Yan, Chenfei Wu
·
// relacionados
Leia também
Blog
The US military used AI to pick thousands of targets but missed a note saying one was a school
Blog
HP accelerates enterprise workflows with OpenAI Frontier
Editorial
O fantasma do Fable 5: banido, o modelo vive nos datasets que o destilam
Editorial