Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

arXiv:2606.25178v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer unevenly across domains. Existing learnability-based curricula adapt to where the policy is currently improving, but are blind to w...

arXiv cs.AI ·Yongjin Yang, Jiarui Liu, Yinghui He, Lezhen Zhang, Bernhard Sch\"olkopf, Zhijing Jin ·
compartilhar: