Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining

Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining

Training-time data augmentation techniques help mitigate overfitting in autoregressive language model pretraining by delaying performance deterioration and improving final model qu…

Hugging Face · Daily Papers ·Michael K. Chen, Xikun Zhang · ·▲ 1 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Michael K. Chen, Xikun Zhang, Fan Bai, Zhengding Hu, Zhen Wang

  • 1 upvotes da comunidade
  • Temas: autoregressive pretraining, overfitting, data augmentation, token-level noise, sequence permutations, target offset prediction

Resumo

Resumo original (em inglês), extraído do paper:

Training-time data augmentation techniques help mitigate overfitting in autoregressive language model pretraining by delaying performance deterioration and improving final model quality when training on fixed datasets for many epochs.

Ler o paper completo no Hugging Face →

compartilhar: