Blog LLMs & Texto Dados & Embeddings

Distributionally Robust Listwise Preference Optimization

arXiv:2607.01715v1 Announce Type: new Abstract: Existing robust preference optimization for language-model alignment mainly studies pairwise supervision and places robustness at the dataset, prompt, or preference-pair level. We instead study listwise preference optimization under ranking-label uncertainty: given a prompt and a candidate list, the observed ranking over that list may be ambiguous due to annotator inconsistency, near-ties, lossy rankwise feedback, or reward-model noise. We propose ...

arXiv cs.AI ·Xudong Wu, Jian Qian, Pangpang Liu, Vaneet Aggarwal, Jiayu Chen · 03 de janeiro de 2026

Ver no Hugging Face

// relacionados

Distributionally Robust Listwise Preference Optimization

Leia também

O complicado problema do Claude Code com a China envolve proibições dos dois lados do Pacífico

AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer

ByteDance-Seed/EdgeBench

Google DeepMind e A24 anunciam parceria de pesquisa inédita