Distributionally Robust Listwise Preference Optimization
arXiv:2607.01715v1 Announce Type: new Abstract: Existing robust preference optimization for language-model alignment mainly studies pairwise supervision and places robustness at the dataset, prompt, or preference-pair level. We instead study listwise preference optimization under ranking-label uncertainty: given a prompt and a candidate list, the observed ranking over that list may be ambiguous due to annotator inconsistency, near-ties, lossy rankwise feedback, or reward-model noise. We propose ...
arXiv cs.AI
·Xudong Wu, Jian Qian, Pangpang Liu, Vaneet Aggarwal, Jiayu Chen
·
// relacionados
Leia também
Blog
O complicado problema do Claude Code com a China envolve proibições dos dois lados do Pacífico
Blog
AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer
Dataset
ByteDance-Seed/EdgeBench
Blog