连续空间的递归最小二乘行动者—评论家算法
2016-08-05 本文已影响120人
hzyido
2 RLSAC 算法
Policy Gradient Methods for Reinforcement Learning with Function SMSM-NIPS99.pdf
此文是前面看的几篇的基础
** 2 Policy Gradient with Approximation**
Theorem 2 (Policy Gradient with Function Approximation).
3 Application to Deriving Algorithms and Advantages
7p
the advantage function
在综述中描述不清,这里解释比较通顺。The choice of v does not affect any of our theorems, but can substantially affect the variance of the gradient estimators. baseline的问题
4 Convergence of Policy Iteration with Function Approximation