DriftScope: Measuring The Hidden Effects of Diffusion Model Adaptation

arXiv:2607.00183v1 Announce Type: new Abstract: Adapting pre-trained text-to-image diffusion models, whether to learn new visual concepts or erase unwanted ones, is routinely evaluated on its intended effects alone. We argue this framing is incomplete. Through sparse autoencoder analysis and zero-shot classification, we demonstrate that adaptation systematically damages semantically unrelated concepts in ways that aggregate metrics structurally cannot surface: when damage is severe enough for FI...

arXiv cs.CV ·H\'ector Laria, Yiping Han, Julian D. Santamaria, Kai Wang, Bogdan Raducanu, Joost van de Weijer, Alexandra Gomez-Villa ·
compartilhar: