Skill Coverage: A Test Adequacy Metric for Agent Skills

arXiv:2606.20659v1 Announce Type: new Abstract: Agent skills encode reusable procedural knowledge that guides large language model agents across tasks and execution contexts. Existing evaluations primarily assess skills through task level outcomes, yet task success alone does not reveal which parts of a skill have been exercised or which remain untested. We introduce skill coverage, a test adequacy metric that treats the skill artifact as the object under test. Our approach extracts observable s...

arXiv cs.AI ·Boyin Tan, Xiaowei Huang, Youcheng Sun ·
compartilhar: