Blog LLMs & Texto Dados & Embeddings

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

arXiv:2606.27941v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) provide useful decompositions of Transformer residual streams, but their learned features are usually named post hoc rather than directly connected to the Transformer's token vocabulary. We introduce Vocabulary-Aligned Sparse Autoencoder (VASAE), a method that trains SAE features under vocabulary-aligned anchoring and assigns each feature an intrinsic token name: the token string whose embedding is nearest to that feature...

arXiv cs.CL ·Kairui Zhang, Ziwen Yu, Zahraa S. Abdallah, Martha Lewis · 29 de janeiro de 2026

Ver no Hugging Face

// relacionados

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

Leia também

The US military used AI to pick thousands of targets but missed a note saying one was a school

HP accelerates enterprise workflows with OpenAI Frontier

O fantasma do Fable 5: banido, o modelo vive nos datasets que o destilam

MultiHashFormer: e se cada palavra fosse uma impressão digital?