DEMM-Bench: A Cross-Regime Benchmark for Agent-Runtime Governance-Evidence Sufficiency
arXiv:2606.20634v1 Announce Type: new Abstract: Agent-runtime systems emit traces, ledgers, provenance graphs, policy logs, delegation tokens, cache events, and tool-firewall records, but those containers do not necessarily answer governance questions about a specific decision. DEMM-Bench is a cross-regime benchmark for agent-runtime governance-evidence sufficiency, grounded in the Decision Evidence Maturity Model (DEMM): it measures whether records across eight evidence regimes are sufficient t...
arXiv cs.AI
·Oleg Solozobov
·
// relacionados
Leia também
Blog
How Businesses Are Building Specialized AI They Can Trust
Blog
Fika Jobs raises $4M to build a video-first hiring platform where AI agents interview candidates
Blog
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
Blog