Paper Summary 1: Online Convex P
0 Paper: Zinkevich, Martin. "Online convex programming and generalized infinitesimal gradient ascent." In Proceedings of the 20th international conference on machine learning (icml-03), pp. 928-936. 2003.
1 Motivation: A general setting is like this. Imagine a farmer who decides what to plant each year. She has certain restrictions on her resources, both land and labor, as well as restrictions on the output she is allowed to produce. How can she select which crops to grow without knowing in advance what the prices will be?
2 Problem statement:
Given a known feasible set and an infinite sequence of convex functions
that
at step
. By the time a decision
is made, a cost function
associate with the decision is assigned. How to make the decision so as to minimize the cost? And how to evaluate the algorithm that makes the decision?
3 Regret of an algorithm:
Regret is a metric to assess the performance of a decision making algorithm . Give an algorithm and its selected decision
, the regret of A can be defined as follows.
![](https://img.haomeiwen.com/i14783885/e58062b25a54b462.png)
![](https://img.haomeiwen.com/i14783885/297ab056eff92874.png)
![](https://img.haomeiwen.com/i14783885/c91622876e239054.png)
means that select a fixed
such that the sum of costs received by A with
as input is minimized.
4 Important Notations
1)
2)
3) projection
4)
5)
5 Algorithm 1: Greedy Projection
1) Definition
![](https://img.haomeiwen.com/i14783885/80e38c5ee46748cd.png)
where, is the learning rate. What
does is to map the gradient descent update into the feasible set by projection.
2) Upper bound of regret
For
![](https://img.haomeiwen.com/i14783885/230119c01e4d52e4.png)
6 Algorithm 2: Lazy Projection
1) Definition
For an arbitrary start ,
![](https://img.haomeiwen.com/i14783885/3e8a04be38ece78e.png)
2) Upper bound of regret
For fixed learning rate,
![](https://img.haomeiwen.com/i14783885/4a0f7245a27c9302.png)
7 Algorithm 3: Generalized Infinitesimal Gradient Ascent
0) Background: Repeated Games
Algorithm 3 is formulated from the concept of repeated games. Consider a matching game that a player needs to match two sets of actions and
, where a utility function
can be defined as
and otherwise
(1) history: a sequence of joint actions.
(1)' is the set of all histories of length
(2)' is the length of history
(3)' e.g. of with length
(4)' utility of a history
To access the history, define to be the i-th joint action. E.g.,
and
![](https://img.haomeiwen.com/i14783885/20b4a6f8137c422e.png)
(5)' history change
Namely, replace each action of with an action
(6)' regret of not playing action
![](https://img.haomeiwen.com/i14783885/bf8b6e3d0134e9b2.png)
(7)' the maximum regret,or just simply regret
![](https://img.haomeiwen.com/i14783885/d68031907c9d5b45.png)
The most important aspect of this definition of regret is that regret is a function of the resulting history, independent of the strategies that generated that history.
(8)' behavior
Let be the set of all probabilities over
Let be the probability that the boolean predicate
is true for
is selected from a distribution
Then is a function from histories of past actions to distributions over the next action of the player.
(9)' environment
is a function from the history of past actions to distributions over the next action of the environment.
(10)' universally consistent
a behavior is universally consistent if for any
there exists a
such that for all
:
![](https://img.haomeiwen.com/i14783885/cba698d6e3de68ee.png)
In other words, after some time, with high probability the average regret never again exceeds
1) Definition
For an arbitrary start, do the followings
(1) Play according to : play action
with probability
.
(2) Observe the action of the other player and calculate
![](https://img.haomeiwen.com/i14783885/2fd84d86c47703a3.png)
2) upper bound of regret
Regarding the special example presented in (0) background, we have , and
, where
![](https://img.haomeiwen.com/i14783885/c677607ae7c167ee.png)
Then for , the expected regret of Generalized Infinitesimal Gradient Ascent for all obvious determinsitic environments
, for all
, for all
is
![](https://img.haomeiwen.com/i14783885/b1a883430329011f.png)
Note: An environment is an oblivious deterministic environment if it plays the same sequence of actions regardless of the actions of the player.
8 Summary
The paper formulated online convex programing and introduced several algorithms including the three algorithms presented in this post. A index called regret was utilized to evaluate the performance of the algorithms. Finally, the upper bound of regret of the discussed algorithms were derived.