机器学习与模式识别每周500字理科生的果壳

Machine Learning笔记 第16周

2016-05-01  本文已影响663人  我的名字叫清阳

Machine learning 从第10周之后我就没再更新。一个原因是自己根本没时间学习,另一个原因是剩下的部分中所有内容,都是在我去年上过的另外一门课 Reinforcement Learning 中讲到了。需要那些笔记的话,直接去往下面的链接。 Reinforcement Learning 第一周课程笔记 : MDP;Reinforcement Learning 第十二周课程笔记: Game Theory I;Reinforcement Learning 第十三周课程笔记: Game Theory II & III。

下面的这部分内容是RL课中没有,而本课独有的内容。

Reinforcement Learning

RL API Reinforcement learning history More RL "APIs" What do you call these? Three ways of solving RL problems Q-learning

With Q, we can find out U or PI without knowing transition or action. This is why Q learning works.

what Q-learning can do

Estimating Q From Transitions

Paste_Image.png what V converges to?

V will converge to the estimated value of X when alpha satisfies: all alphas sum to infinity, but all alpha square sum to a certain number. (e.t alpha = 1/t).

Q learning proof

The first step is a bit ambiguous because Q-hat changes over time. But it works in practice.

Q learning steps

Q-learning only works if s,a visited infinitely often. and alphat satisfy the conditions that it sums to infinity but the square of it sums to something less than infinity.

Paste_Image.png Greedy exploration

exploration & exploitation.

Wrap up


wrap up

今天就要考试了,我根本就没复习好。祝我好运吧。

2016-04-30 初稿
上一篇 下一篇

猜你喜欢

热点阅读