PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

PlanBench-XL evaluates large language model agents' ability to plan and adapt in complex tool-rich environments with limited visibility and dynamic disruptions.

Hugging Face · Daily Papers ·Jiayu Liu, Qihan Lin · ·▲ 63 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Jiayu Liu, Qihan Lin, Cheng Qian, Rui Wang, Emre Can Acikgoz, Xiaocheng Yang

  • 63 upvotes da comunidade
  • Temas: LLM agents, tool ecosystems, planning, retrieval-limited tool visibility, interactive benchmark, tool discovery

Resumo

Resumo original (em inglês), extraído do paper:

PlanBench-XL evaluates large language model agents' ability to plan and adapt in complex tool-rich environments with limited visibility and dynamic disruptions.

Ler o paper completo no Hugging Face →

compartilhar: