Markov Decision Processes II

2020-01-12  本文已影响0人  Ysgc

waste of computation

policy evaluation is a fixed-policy version of value iteration

full (MDP) problem solved in one step
-> value iteration solution by bellman equation (consider every action for each state)
-> policy evaluation + policy improvement (take only one action for each state)

we aren't given the MDP
(meaning that the transition matrix is given???)

上一篇 下一篇

猜你喜欢

热点阅读