Blog LLMs & Texto Robótica & RL

Scalable Hierarchical Attention Transformers for Multi-Turn Jailbreak Detection in Long Conversations

arXiv:2606.21082v1 Announce Type: new Abstract: Multi-turn jailbreaks can evade turn-level moderation by spreading unsafe intent across a dialogue through gradual escalation, reframing, and role manipulation. We address multi-turn jailbreak detection as a conversation-level classification problem and introduce an efficient hierarchical detector that avoids expensive long-context concatenation while retaining cross-turn reasoning. The model encodes individual turns to form compact turn representa...

arXiv cs.CL ·Chenhui Hu, Muhammed Salih, Sudipto Guha, Subramanian Srinivasan · 23 de janeiro de 2026

Ver no Hugging Face

// relacionados

Scalable Hierarchical Attention Transformers for Multi-Turn Jailbreak Detection in Long Conversations

Leia também

How Businesses Are Building Specialized AI They Can Trust

Fika Jobs raises $4M to build a video-first hiring platform where AI agents interview candidates

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

Cursor announces its own AI model, a new Git platform, and a mobile app