连续空间的递归最小二乘行动者—评论家算法

2016-08-05 本文已影响120人 hzyido

2 RLSAC 算法

Policy Gradient Methods for Reinforcement Learning with Function SMSM-NIPS99.pdf

此文是前面看的几篇的基础
** 2 Policy Gradient with Approximation**

Theorem 2 (Policy Gradient with Function Approximation).

3 Application to Deriving Algorithms and Advantages
7p
the advantage function
在综述中描述不清，这里解释比较通顺。The choice of v does not affect any of our theorems, but can substantially affect the variance of the gradient estimators. baseline的问题

4 Convergence of Policy Iteration with Function Approximation

连续空间的递归最小二乘行动者—评论家算法

Policy Gradient Methods for Reinforcement Learning with Function SMSM-NIPS99.pdf

猜你喜欢

热点阅读