Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving
Paper LLMs & Texto

Cluster, Route, Escalate: Cascaded Framework for Cost-Aware LLM Serving

A cascaded approach for deploying large language models that balances accuracy and cost by routing queries to appropriate models based on clustering and quality estimation.

Hugging Face · Daily Papers ·Yasmin Moslem, Magdalena Kacmajor · ·▲ 1 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Yasmin Moslem, Magdalena Kacmajor, Vasudevan Nedumpozhimana, Ammar Abbas, Solmaz Panahi, David Lynch

  • 1 upvotes da comunidade
  • Temas: large language models, cost-effective model, quality estimation, cascaded solution, query clustering, model routing

Resumo

Resumo original (em inglês), extraído do paper:

A cascaded approach for deploying large language models that balances accuracy and cost by routing queries to appropriate models based on clustering and quality estimation.

Onde ler

compartilhar: