Lecture 4: Model-Free Prediction

2020-04-13  本文已影响0人  魏鹏飞

Author:David Silver

Outline

  1. Introduction
  2. Monte-Carlo Learning
  3. Temporal-Difference Learning
  4. TD(λ)

Model-Free Reinforcement Learning

Monte-Carlo Reinforcement Learning

Monte-Carlo Policy Evaluation

First-Visit Monte-Carlo Policy Evaluation

Every-Visit Monte-Carlo Policy Evaluation

Blackjack Example

Blackjack Value Function after Monte-Carlo Learning

Incremental Mean

Incremental Monte-Carlo Updates

Temporal-Difference Learning

MC and TD

Driving Home Example

Driving Home Example: MC vs. TD

Advantages and Disadvantages of MC vs. TD

Bias/Variance Trade-Off

Advantages and Disadvantages of MC vs. TD (2)

Random Walk Example

Random Walk: MC vs. TD

Batch MC and TD

AB Example

AB Example

Certainty Equivalence

Advantages and Disadvantages of MC vs. TD (3)

Monte-Carlo Backup

Temporal-Difference Backup

Dynamic Programming Backup

Bootstrapping and Sampling

Unified View of Reinforcement Learning

n-Step Prediction

n-Step Return

Large Random Walk Example

Averaging n-Step Returns

λ-return

TD(λ) Weighting Function

Forward-view TD(λ)

Forward-View TD(λ) on Large Random Walk

Backward View TD(λ)

Eligibility Traces

Backward View TD(λ)

TD(λ) and TD(0)

TD(λ) and MC

MC and TD(1)

Telescoping in TD(1)

TD(λ) and TD(1)

Telescoping in TD(λ)

Forwards and Backwards TD(λ)

Offline Equivalence of Forward and Backward TD

Offline updates

Onine Equivalence of Forward and Backward TD

Online updates

Summary of Forward and Backward TD(λ)

Reference:《UCL Course on RL》

上一篇 下一篇

猜你喜欢

热点阅读