JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various ben…

Hugging Face · Daily Papers ·Lanxiang Hu, Zhaoxiang Feng · ·▲ 29 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Lanxiang Hu, Zhaoxiang Feng, Yulun Wu, Haoran Yuan, Yujie Zhao, Yu-Yang Qian

  • 29 upvotes da comunidade
  • Temas: speculative decoding, autoregressive Large Language Models, draft budget, acceptance rate, causality-efficiency dilemma, tree speculative decoding

Resumo

Resumo original (em inglês), extraído do paper:

JetSpec is a speculative decoding framework that combines efficient forward drafting with causal conditioning to improve LLM inference speed and acceptance rates across various benchmarks.

Onde ler

compartilhar: