Blog LLMs & Texto

A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

arXiv:2606.25380v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for multilingual LLMs. We first catalogue threat models that exploit language choice, translation pivots, code-switching, orthographic variation, multi-turn interaction, and post-deployment fine-tuning to weaken safety alignmen...

arXiv cs.CL ·Soham Dan, Himanshu Beniwal, Thomas Hartvigsen · 25 de janeiro de 2026

Ver no Hugging Face

// relacionados

A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models

Leia também

Amazon ups India bet with fresh $13B AI infrastructure investment

Jalapeño: a OpenAI projeta seu primeiro chip de inferência — e usou IA para fazer isso em 9 meses

SkillOpt: como ensinar agentes de IA a melhorar suas próprias habilidades — +23 pontos em GPT-5.5

Authors Guild test finds some AI detectors perfectly identify human writing while others fail on every single text