Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning
Data-centric approach using curated datasets and minimal GRPO setup significantly improves long-context reasoning in large language models, outperforming prior reinforcement learni…
Hugging Face · Daily Papers
·Xiaoyue Xu, Sikui Zhang
·
·▲ 4 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Xiaoyue Xu, Sikui Zhang, Xiaorong Wang, Xu Han, Chaojun Xiao
- 4 upvotes da comunidade
- Temas: long-context reasoning, reinforcement learning, GRPO, large language models, agent-tuned models, GAIA
Resumo
Resumo original (em inglês), extraído do paper:
Data-centric approach using curated datasets and minimal GRPO setup significantly improves long-context reasoning in large language models, outperforming prior reinforcement learning methods.