深度学习经典论文Top100 系列之优化-Dropout(1)

2018-10-22  本文已影响123人  吐舌小狗

深度学习经典论文Top100(Most Cited Deep Learning Papers) 阅读笔记. 论文集地址:https://github.com/terryum/awesome-deep-learning-papers.
本文摘要:本文介绍了Dropout的基本概念和基本方法,以及一些使用的注意事项;
关键词:过拟合; Dropout

相关论文

2012 Improving neural networks by preventing co-adaptation of feature detectors
https://arxiv.org/pdf/1207.0580.pdf
[2014 JMLR] Dropout: A Simple Way to Prevent Neural Networks from Overfitting
http://jmlr.org/papers/volume15/srivastava14a.old/srivastava14a.pdf

I.问题

文中提到:深度神经网络非常的有效, 但是在在少量样本的情况下,使用(深度)神经网络很容易过拟合。

II.方法原理

1.基本概念

Dropout:如下图所示, 在训练的时候,使部分神经元暂时被隔离或者被忽略,减少一些某些特征的协同作用(co-adaptation),这样可以让每一个神经元独立地学习到一个特征,这样在测试的时候,可以组合出更多的内部的上下文(interal contexts),也就是组合特征。


Dropout

下面是论文中提到的关于Dropout的直观解释:

训练阶段和测试阶段

下面是我的理解:训练的时候,让每个节点独当一面,测试的时候一块用。

2.基本方法

标准的网络:


Standard neural network

加入dropout:


NN with dropout
也就是下面这个图的内容:
NN and NN with dropout

III.相似的工作

1.多个独立的模型的Combination
文中提到这种结合的方法的缺点:

2.解决过拟合的方法

IV.关键信息

1.Dropout 配置

One may have presumed that since the convolutional layers don’t have a lot of parameters, overfitting is not a problem and therefore dropout would not have much effect.

We found that the improvement was much smaller compared to that for the vision and speech datasets. This might be explained by the fact that this data set is quite big (more than 200,000 training examples) and overfitting is not a very serious problem.

2.配置说明

尽管Dropout在解决过拟合的时候能够取得很好的效果,但是往往需要与其他的方法进行结合,比如 max-norm regularization, large decaying learning rates and high momentum. Dropout会引入大量的噪声,如果采用原来的学习率,训练时间会加倍,因此一般采用10~100倍原来的学习率.另外的一种方法是使用高动量进行训练,可以加速搜索.

3.Global contrast normalization

Global contrast normalization means that for image and each colour channel in that image, we compute the mean of the pixel intensities and subtract it from the channel.

4.ZCA whitening

ZCA whitening means that we mean center the data, rotate it onto its principle components, normalize each component and then rotate it back.

5. 使用高斯分布(Multiplicative Gaussian Noise Dropout)

h_i \to h_i + h_i r, r-N(1,\sigma^2)
这里的\sigma是一个可学习的参数

VI. 其他

Applying dropout to a neural network amounts to sampling a "thinned" network from it.
采取dropout等价于一个神经网络从其中采样了一个瘦的网络.

As the learning rate decays, the optimization takes shorter steps, thereby doing less exploration and eventually settles into a minimum.
随着学习率的衰减,优化步骤也会缩短,从而减少了更少的搜索,最终会到达一个最小值

VII.References

https://blog.csdn.net/hjimce/article/details/50413257
https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/
https://www.jiqizhixin.com/technologies/1c91194a-1732-4fb3-90c9-e0135c69027e

上一篇下一篇

猜你喜欢

热点阅读