Paper
LLMs & Texto
ELDR: Expert-Locality-Aware Decode Routing for PD-Disaggregated MoE Serving
ELDR is an expert-locality-aware decode router for prefill-decode disaggregated Mixture-of-Experts serving that improves performance by predicting expert activations and routing re…
Hugging Face · Daily Papers
·Sangjin Choi, Sukmin Cho
·
·▲ 16 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Sangjin Choi, Sukmin Cho, Yifan Xiong, Ziyue Yang, Youngjin Kwon, Peng Cheng
- 16 upvotes da comunidade
- Temas: prefill-decode, disaggregated LLM serving, mixture-of-experts, decode router, expert-locality-aware, K-means
Resumo
Resumo original (em inglês), extraído do paper:
ELDR is an expert-locality-aware decode router for prefill-decode disaggregated Mixture-of-Experts serving that improves performance by predicting expert activations and routing requests accordingly.Onde ler
// relacionados
Leia também
Editorial
Claude Sonnet 5: a Anthropic aposta que o modelo do meio faz o trabalho do topo
Blog
Google’s AI buildout drove 37% increase in electricity use in 2025
Blog
OpenAI reportedly offers the Trump administration a five percent stake in the company
Blog