Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining
Training-time data augmentation techniques help mitigate overfitting in autoregressive language model pretraining by delaying performance deterioration and improving final model qu…
Hugging Face · Daily Papers
·Michael K. Chen, Xikun Zhang
·
·▲ 1 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Michael K. Chen, Xikun Zhang, Fan Bai, Zhengding Hu, Zhen Wang
- 1 upvotes da comunidade
- Temas: autoregressive pretraining, overfitting, data augmentation, token-level noise, sequence permutations, target offset prediction
Resumo
Resumo original (em inglês), extraído do paper:
Training-time data augmentation techniques help mitigate overfitting in autoregressive language model pretraining by delaying performance deterioration and improving final model quality when training on fixed datasets for many epochs.
// relacionados
Leia também
Blog
Oracle’s 21,000 layoffs help drive its debt-fueled AI investments
Blog
How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Python
Blog
How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery
Blog