Blog
Robótica & RL
EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games
arXiv:2606.23995v1 Announce Type: new Abstract: Recent work has established that regularized policy gradient methods such as PPO, when used in self-play, can match or exceed specialized game-theoretic algorithms for solving two-player zero-sum imperfect-information games. The uniform distribution has emerged as a strong policy regularization target for this purpose, but it regularizes equally toward all actions regardless of their viability. We introduce EMAgnet, which instead regularizes toward...
arXiv cs.LG
·Tristan Maidment, JB Lanier, Chase McDonald, Nathan Tsang, Eugene Vinitsky, Roy Fox, Albert Wang, Wesley N. Kerr
·
// relacionados
Leia também
Blog
Former Infosys chief has a new startup that wants to challenge the IT services world
Blog
Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost
Blog
Agility Robotics plans to go public via SPAC in a $2.5B deal
Blog