AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

AC-ODM optimizes pretraining data composition for LLMs using reinforcement learning to improve convergence speed and downstream accuracy while maintaining computational efficiency.

Hugging Face · Daily Papers ·Jing Ma, Chenhao Dang · ·▲ 1 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Jing Ma, Chenhao Dang, Mingjie Liao

  • 1 upvotes da comunidade
  • Temas: pretraining data composition, LLM generalization, dynamic mixing, static strategies, reinforcement learning, parameterized policy

Resumo

Resumo original (em inglês), extraído do paper:

AC-ODM optimizes pretraining data composition for LLMs using reinforcement learning to improve convergence speed and downstream accuracy while maintaining computational efficiency.

Ler o paper completo no Hugging Face →

compartilhar: