Blog Robótica & RL LLMs & Texto

SEAD: Competence-Aware On-Policy Distillation via Entropy-Guided Supervision

arXiv:2606.28562v1 Announce Type: new Abstract: On-policy distillation (OPD) has a property absent in offline distillation and RL: teacher supervision quality depends on student competence. Incoherent rollouts yield noisy gradients; already-mastered tokens yield redundant ones. This creates waste at three scales (tokens, training phases, and prompts) yet existing methods supervise uniformly. We introduce SEAD, which uses entropy as a unified probe of this competence-dependent degradation at thre...

arXiv cs.CL ·Chia-Hsuan Lee, Zelei Cheng, Yu Wang, Renkun Ni, Sambit Sahu, Shi-Xiong Zhang, William Campbell · 30 de janeiro de 2026

Ver no Hugging Face

// relacionados

SEAD: Competence-Aware On-Policy Distillation via Entropy-Guided Supervision

Leia também

Linq’s iMessage Apps Bring Payments, Tickets, Flights, and Games Into the iMessage Bubble Through the imessage_app Part

Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared

Google's new Nano Banana 2 Lite image model is its fastest and cheapest yet

Trump's plan to redesign every .gov website leads to AI-designed horrors