[Notes]Lecture14 Stochastic Mult

2018-11-10  本文已影响0人  半山来客

文章链接:http://www-bcf.usc.edu/~haipengl/courses/CSCI699/lecture14.pdf

内容与读过的几篇高度重叠,只作部分摘录:

Stochastic Multi-armed Bandit

Pseudo-regret

Pseudo-regret is the expected regret against the fixed action a^*(instead of the empirically best actiontion, where the expectation is over the randomness of both the environment and the algorithm.

Symbols


上一篇 下一篇

猜你喜欢

热点阅读