Asynchronous Methods for Deep Re

2019-01-10  本文已影响7人  初七123

Introduction

Deep RL algorithms based on experience replay haveachieved unprecedented success in challenging domainssuch as Atari 2600. However, experience replay has severaldrawbacks: it uses more memory and computation per realinteraction; and it requires off-policy learning algorithmsthat can update from data generated by an older policy

Related Work

In Gorila, each process contains an actor that acts in its own copyof the environment, a separate replay memory, and a learnerthat samples data from the replay memory and computesgradients of the DQN loss (Mnih et al., 2015) with respectto the policy parameters. The gradients are asynchronouslysent to a central parameter server which updates a centralcopy of the model.

(Tsitsiklis, 1994) studied convergence properties of Q-learning in the asynchronous optimization setting. Theseresults show that Q-learning is still guaranteed to convergewhen some of the information is outdated as long as out-dated information is always eventually discarded and sev-eral other technical assumptions are satisfied.

Asynchronous RL Framework

We now present multi-threaded asynchronous variants ofone-step Sarsa, one-step Q-learning, n-step Q-learning, andadvantage actor-critic

上一篇下一篇

猜你喜欢

热点阅读