JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting
JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various ben…
Hugging Face · Daily Papers
·Lanxiang Hu, Zhaoxiang Feng
·
·▲ 29 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Lanxiang Hu, Zhaoxiang Feng, Yulun Wu, Haoran Yuan, Yujie Zhao, Yu-Yang Qian
- 29 upvotes da comunidade
- Temas: speculative decoding, autoregressive Large Language Models, draft budget, acceptance rate, causality-efficiency dilemma, tree speculative decoding
Resumo
Resumo original (em inglês), extraído do paper:
JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various benchmarks.Onde ler
// relacionados