On the Limits of Steering Vectors for Preference-Aligned Generation

arXiv:2607.01802v1 Announce Type: new Abstract: Steering vectors have emerged as a promising approach to controlled text generation, offering interpretable, training-free mechanisms for shaping model outputs. However, their practical generality remains poorly understood. We study the limits of steering vector generalization along three dimensions: trait expressibility, task transfer, and multi-trait composition. Using the PLUME writing personalization benchmark, we extract steering vectors for a...

arXiv cs.CL ·Melanie Subbiah, Zara Hall, Kathleen McKeown ·
compartilhar: