有用经某石的玩具罐每周500字

Reinforcement Learning 第十一周课程笔记

2015-11-02  本文已影响268人  我的名字叫清阳

This week

Generalizing Generalization Things to make RL hard

Temporal Abstraction

Temporal Abstraction

Temporal Abstraction Options

Options

Temporal Abstraction Option Function

Temporal Abstraction Option Function

Pac-Man Problems

Quiz 1: Pac-Man Problems Quiz 1 solution Pac-Man Problems

How It Comes Together

Goal Abstraction

goal abstraction Goal Abstraction

Monte Carlo Tree Search

Monte Carlo Tree Search

In the figure above, circles are states, edges are transitions. π =Q^(s,a) is the policy of the known part of the tree. In these states, we know what action to take following π (pink edges). When reach an unknown state, we apply the rollout policy πr, and simulate actions to take deep in the tree, and then we backup and update πr and π to figure out what to select at each state, including the unknown state where we started the simulation. π gets expanded as we figure out the policy at unknown state. Then repeat the "Select, Expand, simulate, back up" process.

Monte Carlo Tree Search

MCTS Properties

MCTS Properties

Recap

recap
2015-10-28 初稿
2015-11-01 finished.
2015-12-04 reviewed and revised
上一篇下一篇

猜你喜欢

热点阅读