[Notes]Lecture14 Stochastic Mult

2018-11-10 本文已影响0人半山来客

文章链接：http://www-bcf.usc.edu/~haipengl/courses/CSCI699/lecture14.pdf

这篇讲义主要介绍了Stochastic MAB 的一些基本概念，有很多数学公式及证明，如果要从数学角度理解细节和推敲，可以参考。
第一部分将Stochastic MAB的基本概念，讲解了pesudo-regret。
第二部分 2 First Attempt: Explore-then-exploit 一种基本思想，引出了bound。
第三部分 3 The UCB Algorithm, 讲义中实际中讨论的Lower Bound，思想与UCB对称。

内容与读过的几篇高度重叠，只作部分摘录：

Stochastic Multi-armed Bandit

Pseudo-regret

Pseudo-regret is the expected regret against the ﬁxed action $a^*$ (instead of the empirically best actiontion, where the expectation is over the randomness of both the environment and the algorithm.

Pseudo-regret can be simplified as:

Simpified pseudo-regret
pseudo-regret of UCB is bounded as:

pseudo-regret bound

Symbols

$a$ :each action
$D_a$ :Distribution
$l_1(a),\dots,l_T(a)$ : Independent samples of $D_a$
$a^*=argmin_a \mu(a)$ : action $argmin_a \hat{\mu}(a)$ : Optimal action on terms of the expected lossEmpirically best
$\Delta_a = \mu(a)-\mu(a^*)$ : the suboptiomal gap of action a
The number of times action a has been pulled up to round t

[Notes]Lecture14 Stochastic Mult

Stochastic Multi-armed Bandit

Symbols

猜你喜欢

热点阅读