OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

On-policy skill distillation framework extracts dense hindsight supervision from completed trajectories to improve language agent training efficiency and performance.

Hugging Face · Daily Papers ·Shuo Yang, Jinyang Wu · ·▲ 40 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Shuo Yang, Jinyang Wu, Zhengxi Lu, Yuhao Shen, Fan Zhang, Lang Feng

  • 40 upvotes da comunidade
  • Temas: reinforcement learning, self-distillation, skill-conditioned variants, on-policy trajectories, hierarchical skills, critical-first routing

Resumo

Resumo original (em inglês), extraído do paper:

On-policy skill distillation framework extracts dense hindsight supervision from completed trajectories to improve language agent training efficiency and performance.

Onde ler

compartilhar: