On the Limits of Steering Vectors for Preference-Aligned Generation
arXiv:2607.01802v1 Announce Type: new Abstract: Steering vectors have emerged as a promising approach to controlled text generation, offering interpretable, training-free mechanisms for shaping model outputs. However, their practical generality remains poorly understood. We study the limits of steering vector generalization along three dimensions: trait expressibility, task transfer, and multi-trait composition. Using the PLUME writing personalization benchmark, we extract steering vectors for a...
arXiv cs.CL
·Melanie Subbiah, Zara Hall, Kathleen McKeown
·
// relacionados
Leia também
Blog
AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer
Editorial
NVIDIA abre dataset de 1.700 horas de direção autônoma — e revela onde está o verdadeiro gargalo da IA física
Blog
Leis de escala para busca aproximada de vizinhos mais próximos baseada em grade em altas dimensões
Blog