Simplified Sparse Attention via Gist Tokens

Simplified Sparse Attention via Gist Tokens

Simplified Sparse Attention (SSA) reduces long-context inference costs through gist token-based attention masking during pretraining, enabling efficient chunk selection at inferenc…

Hugging Face · Daily Papers ·Yuzhen Mao, Michael Y. Li · ·▲ 1 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Yuzhen Mao, Michael Y. Li, Emily B. Fox

  • 1 upvotes da comunidade
  • Temas: sparse attention, gist tokens, attention mask, next-token loss, continued pretraining, attention scoring

Resumo

Resumo original (em inglês), extraído do paper:

Simplified Sparse Attention (SSA) reduces long-context inference costs through gist token-based attention masking during pretraining, enabling efficient chunk selection at inference time without architectural modifications.

Onde ler

compartilhar: