Dataset
LLMs & Texto
Qwen/AgentWorldBench
Dataset em destaque no Hugging Face — 87 downloads. AgentWorldBench AgentWorldBench is a comprehensive evaluation benchmark for language world models, constructed from real-world observations of frontie…
Hugging Face · Datasets
·Qwen
·
·↓ 87
·♥ 18
O dataset Qwen/AgentWorldBench está entre os destaques do Hugging Face — dados que alimentam o treinamento e a avaliação dos modelos do momento.
- 87 downloads
- 18 curtidas
Sobre o dataset
AgentWorldBench AgentWorldBench is a comprehensive evaluation benchmark for language world models, constructed from real-world observations of frontier model trajectories on established benchmarks such as Tool Decathlon, Terminal-Bench 1.
text-generation world-model agent benchmark evaluation environment-simulation qwen