Generic Expert Coverage for Pruning SparseMixture-of-Experts Language Models
arXiv:2607.01710v1 Announce Type: new Abstract: Sparsely activated Mixture-of-Experts (MoE) language models contain substantial structured redundancy among routed experts, but pruning them without downstream calibration data remains challenging. Existing expert-pruning methods typically rely on a single aggregated importance score, which can bias the retained set toward experts favored by dominant calibration patterns. We propose \textbf{Generic TB-Coverage}, a coverage-aware expert pruning meth...
arXiv cs.AI
·Yongqin Zeng, Sicheng Pan, Jiale Wang, Hai-tao Zheng, Hong-Gee Kim, Chunxia Ma, XiuTeng Zhou
·
// relacionados
Leia também
Blog
O complicado problema do Claude Code com a China envolve proibições dos dois lados do Pacífico
Blog
AI Security Institute do Reino Unido descobre que benchmarks padrão subestimam sistematicamente o que agentes de IA realmente conseguem fazer
Dataset
ByteDance-Seed/EdgeBench
Blog