RL without TD learning

RL without TD learning

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer . Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalability challenges ), and scales well to long-horizon tasks. We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning. Problem setting: off-policy RL Our problem setting is off-policy RL . Let’s briefly ...

BAIR Blog ·
compartilhar: