Lecture 3: Planning by Dynamic P

2020-04-04  本文已影响0人  魏鹏飞

Author:David Silver

Outline

  1. Introduction
  2. Policy Evaluation
  3. Policy Iteration
  4. Value Iteration
  5. Extensions to Dynamic Programming
  6. Contraction Mapping

What is Dynamic Programming?

Dynamic sequential or temporal component to the problem
Programming optimising a “program”, i.e. a policy

Requirements for Dynamic Programming

Dynamic Programming is a very general solution method for problems which have two properties:

Planning by Dynamic Programming

Other Applications of Dynamic Programming

Dynamic programming is used to solve many other problems, e.g.

Iterative Policy Evaluation

Iterative Policy Evaluation (2)

Evaluating a Random Policy in the Small Gridworld

Iterative Policy Evaluation in Small Gridworld

Iterative Policy Evaluation in Small Gridworld (2)

How to Improve a Policy

Policy Iteration

Jack’s Car Rental

Policy Iteration in Jack’s Car Rental

Policy Improvement

Policy Improvement (2)

Modified Policy Iteration

Generalised Policy Iteration

Principle of Optimality

Deterministic Value Iteration

Example: Shortest Path

Value Iteration

Value Iteration (2)

Example of Value Iteration in Practice

Synchronous Dynamic Programming Algorithms

Asynchronous Dynamic Programming

Asynchronous Dynamic Programming

Three simple ideas for asynchronous dynamic programming:

  1. In-place dynamic programming 2. Prioritised sweeping
  2. Real-time dynamic programming

In-Place Dynamic Programming

Prioritised Sweeping

Real-Time Dynamic Programming

Full-Width Backups

Sample Backups

Approximate Dynamic Programming

Some Technical Questions

Value Function Space

Value Function \infty-Norm

Bellman Expectation Backup is a Contraction

Contraction Mapping Theorem

Convergence of Iter. Policy Evaluation and Policy Iteration

Bellman Optimality Backup is a Contraction

Convergence of Value Iteration

Reference:《UCL Course on RL》

上一篇 下一篇

猜你喜欢

热点阅读