Blog LLMs & Texto Visão Computacional

A Gravitational Interpretation of Fine-Tuning Reversion

arXiv:2606.28525v1 Announce Type: new Abstract: Fine-tuning on harmless data can partially undo behaviors acquired earlier in training. Safety can erode under benign post-alignment updates, unlearned capabilities can re-emerge, latent traits can transfer through apparently unrelated supervision, and related post-alignment fragility appears in other generative settings. We argue these phenomena are usefully viewed through a common training-history lens. Our hypothesis is geometric: large early tr...

arXiv cs.LG ·Samuele Poppi, Nils Lukas · 30 de janeiro de 2026

Ver no Hugging Face

// relacionados

A Gravitational Interpretation of Fine-Tuning Reversion

Leia também

nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16

OpenClaw is finally available on Android and iOS

Claude Science is Anthropic’s newest flagship product

Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared