Blog LLMs & Texto Dados & Embeddings

DEMM-Bench: A Cross-Regime Benchmark for Agent-Runtime Governance-Evidence Sufficiency

arXiv:2606.20634v1 Announce Type: new Abstract: Agent-runtime systems emit traces, ledgers, provenance graphs, policy logs, delegation tokens, cache events, and tool-firewall records, but those containers do not necessarily answer governance questions about a specific decision. DEMM-Bench is a cross-regime benchmark for agent-runtime governance-evidence sufficiency, grounded in the Decision Evidence Maturity Model (DEMM): it measures whether records across eight evidence regimes are sufficient t...

arXiv cs.AI ·Oleg Solozobov · 23 de janeiro de 2026

Ver no Hugging Face

// relacionados

DEMM-Bench: A Cross-Regime Benchmark for Agent-Runtime Governance-Evidence Sufficiency

Leia também

How Businesses Are Building Specialized AI They Can Trust

Fika Jobs raises $4M to build a video-first hiring platform where AI agents interview candidates

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

Cursor announces its own AI model, a new Git platform, and a mobile app