Blog Robótica & RL LLMs & Texto

Massive Activations Are Architecturally Robust: A Controlled Scratch/Commitment Residual Stream Test

arXiv:2606.20743v1 Announce Type: new Abstract: Trained transformers reliably develop massive activations, a small number of hidden dimensions whose magnitude is far above the median and which concentrate on the sequence-start token. Whether these outliers are a removable artifact of the residual stream's overloaded read and write role, or instead a functional necessity, is actively debated. We test the artifact hypothesis directly, with an architectural intervention. Our architecture, Ledger Re...

arXiv cs.LG ·Maruthi Vemula (University of North Carolina at Chapel Hill) · 23 de janeiro de 2026

Ver no Hugging Face

// relacionados

Massive Activations Are Architecturally Robust: A Controlled Scratch/Commitment Residual Stream Test

Leia também

How Businesses Are Building Specialized AI They Can Trust

ByteDance's Seedance 2.5 breaks the 30-second barrier for AI video generation

Foresight: ensinar o robô a saber quando vai falhar

NVIDIA Powers Over 400 of the World’s 500 Fastest Supercomputers