Learning Generalizable Skill Policy with Data-Efficient Unsupervised RL
arXiv:2607.00392v1 Announce Type: new Abstract: Unsupervised Reinforcement Learning (URL) aims to pre-train scalable, skill-conditioned policies without extrinsic rewards, serving as a foundation for downstream control tasks. Despite recent progress, we argue that current off-policy URL methods are limited by two critical, overlooked bottlenecks: (1) non-stationary skill semantics and (2) brittle generalization. To address these challenges, we propose GenDa (Generalizable Data-efficient Agent), ...
arXiv cs.LG
·Jongchan Park, Seungjun Oh, Seungho Baek, Yusung Kim
·
// relacionados