Generic Expert Coverage for Pruning SparseMixture-of-Experts Language Models

arXiv:2607.01710v1 Announce Type: new Abstract: Sparsely activated Mixture-of-Experts (MoE) language models contain substantial structured redundancy among routed experts, but pruning them without downstream calibration data remains challenging. Existing expert-pruning methods typically rely on a single aggregated importance score, which can bias the retained set toward experts favored by dominant calibration patterns. We propose \textbf{Generic TB-Coverage}, a coverage-aware expert pruning meth...

arXiv cs.AI ·Yongqin Zeng, Sicheng Pan, Jiale Wang, Hai-tao Zheng, Hong-Gee Kim, Chunxia Ma, XiuTeng Zhou ·
compartilhar: