Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It
Research investigates how different supervisory signals and training strategies improve the stability and performance of large language models in tool-use tasks, addressing issues…
Hugging Face · Daily Papers
·Yupu Hao, Zhuoran Jin
·
·▲ 15 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Yupu Hao, Zhuoran Jin, Huanxuan Liao, Kang Liu, Jun Zhao
- 15 upvotes da comunidade
- Temas: agentic reinforcement learning, tool-use tasks, catastrophic collapse, control tokens, supervised fine-tuning, off-policy supervision
Resumo
Resumo original (em inglês), extraído do paper:
Research investigates how different supervisory signals and training strategies improve the stability and performance of large language models in tool-use tasks, addressing issues like catastrophic collapse and format sensitivity through interleaved supervised fine-tuning and reinforcement learning.Onde ler
// relacionados