Blog LLMs & Texto

What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs

arXiv:2606.28615v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in high-stakes domains, where free-text explanations such as chain-of-thought and post-hoc rationales are used to justify model outputs. Yet it remains unclear whether these explanations are sufficient, i.e., if they contain enough information to explain the model's output-generating process. We generalize classical sufficiency from feature attributions to arbitrary explanations and prove that ...

arXiv cs.LG ·Nhi Nguyen, Shauli Ravfogel, Rajesh Ranganath · 30 de janeiro de 2026

Ver no Hugging Face

// relacionados

What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs

Leia também

nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16

OpenClaw is finally available on Android and iOS

Claude Science is Anthropic’s newest flagship product

Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared