Tmax: A simple recipe for terminal agents

Tmax: A simple recipe for terminal agents

A novel RL training approach for terminal agents achieves superior performance using a simplified recipe and expanded dataset, enabling effective training with fewer parameters tha…

Hugging Face · Daily Papers ·Hamish Ivison, Junjie Oscar Yin · ·▲ 4 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Hamish Ivison, Junjie Oscar Yin, Rulin Shao, Teng Xiao, Nathan Lambert, Hannaneh Hajishirzi

  • 4 upvotes da comunidade
  • Temas: terminal agents, language models, reinforcement learning, outcome-only recipe, terminal-Bench 2.0, SFT training

Resumo

Resumo original (em inglês), extraído do paper:

A novel RL training approach for terminal agents achieves superior performance using a simplified recipe and expanded dataset, enabling effective training with fewer parameters than previous methods.

Ler o paper completo no Hugging Face →

compartilhar: