Tmax: A simple recipe for terminal agents
A novel RL training approach for terminal agents achieves superior performance using a simplified recipe and expanded dataset, enabling effective training with fewer parameters tha…
Hugging Face · Daily Papers
·Hamish Ivison, Junjie Oscar Yin
·
·▲ 4 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Hamish Ivison, Junjie Oscar Yin, Rulin Shao, Teng Xiao, Nathan Lambert, Hannaneh Hajishirzi
- 4 upvotes da comunidade
- Temas: terminal agents, language models, reinforcement learning, outcome-only recipe, terminal-Bench 2.0, SFT training
Resumo
Resumo original (em inglês), extraído do paper:
A novel RL training approach for terminal agents achieves superior performance using a simplified recipe and expanded dataset, enabling effective training with fewer parameters than previous methods.
// relacionados