Blog Robótica & RL LLMs & Texto

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

arXiv:2606.25178v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has been extended from single-domain training to multi-domain reasoning suites spanning mathematics, programming, and science. However, the training curriculum (how often each domain is sampled) is typically fixed or hand-tuned, even though reasoning skills transfer unevenly across domains. Existing learnability-based curricula adapt to where the policy is currently improving, but are blind to w...

arXiv cs.AI ·Yongjin Yang, Jiarui Liu, Yinghui He, Lezhen Zhang, Bernhard Sch\"olkopf, Zhijing Jin · 25 de janeiro de 2026

Ver no Hugging Face

// relacionados

Transferability for General Reasoning: An Automated Curriculum for Multi-Domain RLVR

Leia também

Authors Guild test finds some AI detectors perfectly identify human writing while others fail on every single text

IBM claims world’s first sub-1 nanometer chip technology

Rapidata/svg-benchmark

BitRobot/HIW-500-LeRobot