SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn per…

Hugging Face · Daily Papers ·Mohit Raghavendra, Anisha Gunjal · ·▲ 2 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Mohit Raghavendra, Anisha Gunjal, Aakash Sabharwal, Yunzhong He

  • 2 upvotes da comunidade
  • Temas: coding agents, user simulator, interactive software engineering, multi-turn tasks, user-driven workflows, goal discovery

Resumo

Resumo original (em inglês), extraído do paper:

SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn performance and interactive task completion.

Onde ler

compartilhar: