Paper LLMs & Texto Dados & Embeddings

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

PlanBench-XL evaluates large language model agents' ability to plan and adapt in complex tool-rich environments with limited visibility and dynamic disruptions.

Hugging Face · Daily Papers ·Jiayu Liu, Qihan Lin · 21 de janeiro de 2026 ·▲ 63 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Jiayu Liu, Qihan Lin, Cheng Qian, Rui Wang, Emre Can Acikgoz, Xiaocheng Yang

63 upvotes da comunidade
Temas: LLM agents, tool ecosystems, planning, retrieval-limited tool visibility, interactive benchmark, tool discovery

Resumo

Resumo original (em inglês), extraído do paper:

PlanBench-XL evaluates large language model agents' ability to plan and adapt in complex tool-rich environments with limited visibility and dynamic disruptions.

Ler o paper completo no Hugging Face →

Ver no Hugging Face

// relacionados

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Resumo

Leia também

How Businesses Are Building Specialized AI They Can Trust

Fika Jobs raises $4M to build a video-first hiring platform where AI agents interview candidates

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

Cursor announces its own AI model, a new Git platform, and a mobile app