AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining
AC-ODM optimizes pretraining data composition for LLMs using reinforcement learning to improve convergence speed and downstream accuracy while maintaining computational efficiency.
Hugging Face · Daily Papers
·Jing Ma, Chenhao Dang
·
·▲ 1 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Jing Ma, Chenhao Dang, Mingjie Liao
- 1 upvotes da comunidade
- Temas: pretraining data composition, LLM generalization, dynamic mixing, static strategies, reinforcement learning, parameterized policy
Resumo
Resumo original (em inglês), extraído do paper:
AC-ODM optimizes pretraining data composition for LLMs using reinforcement learning to improve convergence speed and downstream accuracy while maintaining computational efficiency.
// relacionados
Leia também
Blog
Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas
Blog
4 days left to save up to $190 on TechCrunch Founder Summit 2026
Blog
How Businesses Are Building Specialized AI They Can Trust
Blog