Blog LLMs & Texto

Epistemic Goggles: A Pretrained Module that Induces an Epistemic Frame via Gradient Editing

arXiv:2607.01690v1 Announce Type: new Abstract: Finetuning a language model on documents that are explicitly annotated as fictional results in a model that still actually believes the documents' core claims, an effect known as Negation Neglect. In our evaluations, models trained on documents prefixed and suffixed with such annotations correctly identify the relevant claims as fictional only about 9% of the time. To address this, we introduce Goggles, a learned module that intervenes on the finet...

arXiv cs.AI ·Joshua Penman · 03 de janeiro de 2026

Ver no Hugging Face

// relacionados

Epistemic Goggles: A Pretrained Module that Induces an Epistemic Frame via Gradient Editing

Leia também

O complicado problema do Claude Code com a China envolve proibições dos dois lados do Pacífico

AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer

ByteDance-Seed/EdgeBench

Google DeepMind e A24 anunciam parceria de pesquisa inédita