Paper LLMs & Texto Multimodal

DOPD: Dual On-policy Distillation

DOPD addresses privilege illusion in on-policy distillation by dynamically routing token-level supervision between teacher and student policies based on advantage gaps and probabil…

Hugging Face · Daily Papers ·Xinlei Yu, Gen Li · 29 de janeiro de 2026 ·▲ 75 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Xinlei Yu, Gen Li, Qingyi Si, Guibin Zhang, Yuqi Xu, Congcong Wang

75 upvotes da comunidade
Temas: on-policy distillation, token-level signals, privileged information, privilege illusion, advantage-aware dual distillation, dynamic routing

Resumo

Resumo original (em inglês), extraído do paper:

DOPD addresses privilege illusion in on-policy distillation by dynamically routing token-level supervision between teacher and student policies based on advantage gaps and probabilities, improving capability transfer in large and vision-language models.

Onde ler

Ver no Hugging Face

// relacionados

DOPD: Dual On-policy Distillation

Resumo

Onde ler

Leia também

Using Lift to Turn Research PDFs into Structured JSON with Controlled, Schema-Guided Field-Level Evaluation

Anthropic Redeploys Claude Fable 5 on July 1 After US Export Controls Lift, Adds New Cybersecurity Classifier

The latest AI news we announced in June 2026

Cloudflare’s new policy pushes AI companies to pay for publishers’ content