Counsel: A Meta-Evaluation Dataset for Agentic Tasks

Counsel: A Meta-Evaluation Dataset for Agentic Tasks

A large-scale dataset of human-metaevaluations of LLM critiques for agentic tasks is introduced to improve the calibration and reliability of automated evaluation methods.

Hugging Face · Daily Papers ·Sashank Pisupati, Henry Broomfield · ·▲ 3 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Sashank Pisupati, Henry Broomfield, Eujeong Choi, Antonia Calvi, Charlie Wang, Roman Engeler

  • 3 upvotes da comunidade
  • Temas: agentic systems, LLM-as-a-judge, meta-evaluation, trajectory evaluation, human alignment, inter-annotator agreement

Resumo

Resumo original (em inglês), extraído do paper:

A large-scale dataset of human-metaevaluations of LLM critiques for agentic tasks is introduced to improve the calibration and reliability of automated evaluation methods.

Ler o paper completo no Hugging Face →

compartilhar: