Distributionally Robust Listwise Preference Optimization

arXiv:2607.01715v1 Announce Type: new Abstract: Existing robust preference optimization for language-model alignment mainly studies pairwise supervision and places robustness at the dataset, prompt, or preference-pair level. We instead study listwise preference optimization under ranking-label uncertainty: given a prompt and a candidate list, the observed ranking over that list may be ambiguous due to annotator inconsistency, near-ties, lossy rankwise feedback, or reward-model noise. We propose ...

arXiv cs.AI ·Xudong Wu, Jian Qian, Pangpang Liu, Vaneet Aggarwal, Jiayu Chen ·
compartilhar: