Blog LLMs & Texto Dados & Embeddings

Breaking Safety at the Token Boundary: How BPE Tokenization Creates Exploitable Gaps in LLM Alignment

arXiv:2607.01239v1 Announce Type: new Abstract: Character-level perturbations bypass safety alignment in modern LLMs despite leaving prompts human-readable. We identify and test a central structural mechanism: BPE tokenization fragments safety-critical words into sub-word pieces, and the three public alignment datasets we surveyed contain no intentionally fragmented inputs. The mechanism is a chain, tested end-to-end on five model families (Qwen-3-4B, Qwen-2.5-7B, Gemma-3-4B, Llama-3.1-8B, Mistr...

arXiv cs.CL ·Tung-Ling Li, Hongliang Liu, Yuhao Wu · 03 de janeiro de 2026

Ver no Hugging Face

// relacionados

Breaking Safety at the Token Boundary: How BPE Tokenization Creates Exploitable Gaps in LLM Alignment

Leia também

O complicado problema do Claude Code com a China envolve proibições dos dois lados do Pacífico

AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer

ByteDance-Seed/EdgeBench

Google DeepMind e A24 anunciam parceria de pesquisa inédita