A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

arXiv:2606.25380v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for multilingual LLMs. We first catalogue threat models that exploit language choice, translation pivots, code-switching, orthographic variation, multi-turn interaction, and post-deployment fine-tuning to weaken safety alignmen...

arXiv cs.CL ·Soham Dan, Himanshu Beniwal, Thomas Hartvigsen ·
compartilhar: