Paper LLMs & Texto Dados & Embeddings

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn per…

Hugging Face · Daily Papers ·Mohit Raghavendra, Anisha Gunjal · 29 de janeiro de 2026 ·▲ 2 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Mohit Raghavendra, Anisha Gunjal, Aakash Sabharwal, Yunzhong He

2 upvotes da comunidade
Temas: coding agents, user simulator, interactive software engineering, multi-turn tasks, user-driven workflows, goal discovery

Resumo

Resumo original (em inglês), extraído do paper:

SWE-Interact presents a testbed that evaluates coding agents in realistic multi-turn, user-driven software engineering scenarios, revealing significant gaps between single-turn performance and interactive task completion.

Onde ler

Ver no Hugging Face

// relacionados

SWE-INTERACT: Reimagining SWE Benchmarks as User-Driven Long-Horizon Coding Sessions

Resumo

Onde ler

Leia também

Using Lift to Turn Research PDFs into Structured JSON with Controlled, Schema-Guided Field-Level Evaluation

Anthropic Redeploys Claude Fable 5 on July 1 After US Export Controls Lift, Adds New Cybersecurity Classifier

The latest AI news we announced in June 2026

Cloudflare’s new policy pushes AI companies to pay for publishers’ content