Dataset LLMs & Texto

Qwen/AgentWorldBench

Dataset em destaque no Hugging Face — 87 downloads. AgentWorldBench AgentWorldBench is a comprehensive evaluation benchmark for language world models, constructed from real-world observations of frontie…

Hugging Face · Datasets ·Qwen · ·↓ 87 ·♥ 18

O dataset Qwen/AgentWorldBench está entre os destaques do Hugging Face — dados que alimentam o treinamento e a avaliação dos modelos do momento.

  • 87 downloads
  • 18 curtidas

Sobre o dataset

AgentWorldBench AgentWorldBench is a comprehensive evaluation benchmark for language world models, constructed from real-world observations of frontier model trajectories on established benchmarks such as Tool Decathlon, Terminal-Bench 1.

text-generation world-model agent benchmark evaluation environment-simulation qwen

Explorar o dataset no Hugging Face →

compartilhar: