Sharp Minima Can Generalize For

2017-10-27 本文已影响39人 catHeart

Sharp Minima Can Generalize For Deep Nets, https://arxiv.org/abs/1703.04933

Conventionally, many researchers hold the view that the flatness of the minima in DNN contributes to its generalization ability. This paper argues that common measures fail to descripe the flatness of DNN.
Three kinds of flatness measures are investigated, $\epsilon$-flatness, curvature and $\epsilon$-sharpness.

Given $\epsilon > 0$, a minimum $\theta$, and a loss $L$, we define $C(L, \theta, \epsilon)$ as the largest (using inclusion as the partial order over the subsets of $\Theta$) connected set containing $\theta$ such that $\forall \theta' \in C(L, \theta, \epsilon), L(\theta') < L(\theta) + \epsilon$. The $\theta$-flatness will be defined as the volume of $C(L, \theta, \epsilon)$. We will call this measure the volume $\epsilon$-flatness.

Let $B_2(\epsilon, \theta)$ be an Euclidean ball centered on a minimum $\theta$ with radius $\epsilon$. Then, for a non-negative valued loss function $L$, the $\epsilon$-sharpness will be defined as proportional to
$\frac{\max_{\theta' \in B_2(\epsilon, \theta)}(L(\theta') - L(\theta))}{1+L(\theta)}$

New terms

Lipschitz constant

To be honest, I don't understand the detail in the paper. But it worth re-reading.

Sharp Minima Can Generalize For

New terms

猜你喜欢

热点阅读