理科生的果壳

连续空间的递归最小二乘行动者—评论家算法

2016-08-05  本文已影响120人  hzyido

2 RLSAC 算法


Policy Gradient Methods for Reinforcement Learning with Function SMSM-NIPS99.pdf

此文是前面看的几篇的基础
** 2 Policy Gradient with Approximation**


Theorem 2 (Policy Gradient with Function Approximation).





3 Application to Deriving Algorithms and Advantages
7p
the advantage function
在综述中描述不清,这里解释比较通顺。The choice of v does not affect any of our theorems, but can substantially affect the variance of the gradient estimators. baseline的问题

4 Convergence of Policy Iteration with Function Approximation

上一篇下一篇

猜你喜欢

热点阅读