A Verifiable Search Is Not a Learnable Chain-of-Thought
Paper LLMs & Texto

A Verifiable Search Is Not a Learnable Chain-of-Thought

Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a funda…

Hugging Face · Daily Papers ·Harsh Patel · ·▲ 2 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Harsh Patel

  • 2 upvotes da comunidade
  • Temas: chain-of-thought, fine-tuning, distillation, LoRA, Nemotron, backtracking search

Resumo

Resumo original (em inglês), extraído do paper:

Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration.

Ler o paper completo no Hugging Face →

compartilhar: