Paper
LLMs & Texto
A Verifiable Search Is Not a Learnable Chain-of-Thought
Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a funda…
Hugging Face · Daily Papers
·Harsh Patel
·
·▲ 2 upvotes
Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.
Autores: Harsh Patel
- 2 upvotes da comunidade
- Temas: chain-of-thought, fine-tuning, distillation, LoRA, Nemotron, backtracking search
Resumo
Resumo original (em inglês), extraído do paper:
Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration.
// relacionados
Leia também
Blog
Oracle’s 21,000 layoffs help drive its debt-fueled AI investments
Blog
How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Python
Blog
How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery
Blog