Blog Robótica & RL

EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games

arXiv:2606.23995v1 Announce Type: new Abstract: Recent work has established that regularized policy gradient methods such as PPO, when used in self-play, can match or exceed specialized game-theoretic algorithms for solving two-player zero-sum imperfect-information games. The uniform distribution has emerged as a strong policy regularization target for this purpose, but it regularizes equally toward all actions regardless of their viability. We introduce EMAgnet, which instead regularizes toward...

arXiv cs.LG ·Tristan Maidment, JB Lanier, Chase McDonald, Nathan Tsang, Eugene Vinitsky, Roy Fox, Albert Wang, Wesley N. Kerr · 24 de janeiro de 2026

Ver no Hugging Face

// relacionados

EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games

Leia também

Former Infosys chief has a new startup that wants to challenge the IT services world

Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 at a fraction of the cost

Agility Robotics plans to go public via SPAC in a $2.5B deal

3 days left to save up to $190 on your TechCrunch Founder Summit 2026 pass