CAN DEEP REINFORCEMENT LEARNING
Maithra Raghu
Google Brain and Cornell University
{maithrar}@gmail.com
Alex Irpan
Google Brain
Jacob Andreas
University of California, Berkeley
Robert Kleinberg
Cornell University
Quoc V. Le
Google Brain
Jon Kleinberg
Cornell University
ABSTRACT
Deep reinforcement learning has achieved many recent successes, but our understanding
of its strengths and limitations is hampered by the lack of rich environments
in which we can fully characterize optimal behavior, and correspondingly
diagnose individual actions against such a characterization. Here we consider a
family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer,
and we propose their use as environments for evaluating and comparing different
approaches to reinforcement learning. These games have a number of appealing
features: they are challenging for current learning approaches, but they form (i)
a low-dimensional, simply parametrized environment where (ii) there is a linear
closed form solution for optimal behavior from any state, and (iii) the difficulty
of the game can be tuned by changing environment parameters in an interpretable
way. We use these Erdos-Selfridge-Spencer games not only to compare different
algorithms, but also to compare approaches based on supervised and reinforcement
learning, to analyze the power of multi-agent approaches in improving performance,
and to evaluate generalization to environments outside the training set.