Paper LLMs & Texto

A Verifiable Search Is Not a Learnable Chain-of-Thought

Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a funda…

Hugging Face · Daily Papers ·Harsh Patel · 20 de janeiro de 2026 ·▲ 2 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Harsh Patel

2 upvotes da comunidade
Temas: chain-of-thought, fine-tuning, distillation, LoRA, Nemotron, backtracking search

Resumo

Resumo original (em inglês), extraído do paper:

Training models on chain-of-thought demonstrations fails for tasks requiring backtracking search because the forward derivation cannot be faithfully imitated, demonstrating a fundamental limitation in learning search procedures through demonstration.

Ler o paper completo no Hugging Face →

Ver no Hugging Face

// relacionados

A Verifiable Search Is Not a Learnable Chain-of-Thought

Resumo

Leia também

Oracle’s 21,000 layoffs help drive its debt-fueled AI investments

How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Python

How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery

Omio scales travel product development using OpenAI models