03-06 Q-Learning
2017-12-21 本文已影响0人
woodwood2000
https://classroom.udacity.com/courses/ud501/lessons/5247432317/concepts/53299733920923
![](https://img.haomeiwen.com/i5379724/2356a2da0aabeb7b.png)
Q-Learning: model free, 不会用到Transitons T 和 Rewards R,而是用到 Q 函数
Q 函数可以是一个 Table
Q 函数并不是 Greedy 的函数
运行完成后,Pi 策略和 Q 都会得到最优的解
![](https://img.haomeiwen.com/i5379724/f5c74166a74860aa.png)
![](https://img.haomeiwen.com/i5379724/f986ffef471cfc25.png)
![](https://img.haomeiwen.com/i5379724/3b6cbaeaab1a399c.png)
Q'[s,a] 的结果是一个值?reward的 现值+折现值?对的。看第一张 PPT
![](https://img.haomeiwen.com/i5379724/984992b05c3326e9.png)
![](https://img.haomeiwen.com/i5379724/5b9ef6f53240f09a.png)
![](https://img.haomeiwen.com/i5379724/2ed656459abb28e1.png)
![](https://img.haomeiwen.com/i5379724/74cf3670950fe495.png)
那种 reward 更快收敛?
![](https://img.haomeiwen.com/i5379724/b51e890944290882.png)
![](https://img.haomeiwen.com/i5379724/9f9fdb12deb20220.png)
找出好的 State
仅仅是 SMA(simple moving average) 并不是好的状态,adjusted close 也不是。但组合起来就是了。
![](https://img.haomeiwen.com/i5379724/7d42e9f769e3128d.png)
要将状态离散化
![](https://img.haomeiwen.com/i5379724/a90d03c5a540cae5.png)
根据位置决定离散化的分界点 threshold
![](https://img.haomeiwen.com/i5379724/ebd5b017e366c6a3.png)
actions: Buy, Sell, Do nothing
![](https://img.haomeiwen.com/i5379724/f13bb078e33ddd59.png)
![](https://img.haomeiwen.com/i5379724/5de2c24130745ee6.png)
Resources
- CS7641 Machine Learning, taught by Charles Isbell and Michael Littman
- Watch for free on Udacity(mini-course 3, lessons RL 1 - 4)
- Watch for free on YouTube
- Or take the course as part of the OMSCS program!
- RL course by David Silver(videos, slides)
- A Painless Q-Learning Tutorial