Blog LLMs & Texto

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

arXiv:2606.28843v1 Announce Type: new Abstract: Fine-tuning a large language model is a ubiquitous method for enhancing its capability on a specific downstream task. However, prior work has shown that this increase in capability comes with a cost: it can increase a model's tendency to respond to unsafe adversarial prompts, even when fine-tuning with non-adversarial data. We present the first comprehensive empirical study of this phenomenon in multilingual settings by fine-tuning Llama-3.2, Qwen3...

arXiv cs.CL ·Will Hawkins, Kaivalya Rawal, Jonathan Rystr{\o}m, Stratis Tsirtsis, Zihao Fu, Greta Warren, Ryan Brown, Eoin Delaney, Sandra Wachter, Brent Mittelstadt, Chris Russell · 30 de janeiro de 2026

Ver no Hugging Face

// relacionados

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

Leia também

nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16

OpenClaw is finally available on Android and iOS

Claude Science is Anthropic’s newest flagship product

Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared