空洞卷积
论文来源:
https://arxiv.org/abs/1511.07122v2
Yu F , Koltun V . Multi-Scale Context Aggregation by Dilated Convolutions[J]. 2015.
动图理解:
https://github.com/vdumoulin/conv_arithmetic
作者认为dense prediction有两个关键点 多尺度上下文推理 和 全分辨率输出 。
既要通过前面的多尺度卷积抽取图像的特征,卷积抽取特征的一大特点就是特征的尺寸会变小;同时,由于前面说了dense prediction需要对每个像素进行预测,输出需要保存和原分辨率。像FCN就是前面抽特征的时候先缩,特征抽完了再放。
膨胀卷积就是为了保持泛化的抽特征,同时图像的尺寸不缩减。
本篇论文中的膨胀卷积平面计算层是这样的
膨胀卷积平面计算层
(a) 普通卷积,1-dilated convolution,卷积核的感受野为。
(b) 扩张卷积,2-dilated convolution,卷积核的感受野为。
(c) 扩张卷积,4-dilated convolution,卷积核的感受野为。
从上图中可以看出,卷积核的参数个数保持不变,感受野的大小随着“dilation rate”参数的增加呈指数增长。
Dilated Convolutions,翻译为扩张卷积或空洞卷积。扩张卷积与普通的卷积相比,除了卷积核的大小以外,还有一个扩张率(dilation rate)参数,主要用来表示扩张的大小。扩张卷积与普通卷积的相同点在于,卷积核的大小是一样的,在神经网络中即参数数量不变,区别在于扩张卷积具有更大的感受野。感受野是卷积核在图像上看到的大小,例如卷积核的感受野大小为9。
一般卷积和膨胀卷积的计算差别
以上参考:
https://blog.csdn.net/juanjuan1314/article/details/82252451
Our architecture is motivated by the fact that dilated convolutions support exponentially expanding receptive fields without losing resolution or coverage.
我们的架构受到这样一个事实的推动:扩张的卷积支持指数扩展的感受域而不会丢失分辨率或覆盖范围。
Consider applying the filters with exponentially increasing dilation:
考虑应用指数增加膨胀的滤波器:
for
扩展的接受域:
论文的第3部分是: MULTI-SCALE CONTEXT AGGREGATION,即多尺度上下文融合。
The basic context module has 7 layers that apply 3×3 convolutions with different dilation factors.The dilations are 1, 1, 2, 4, 8, 16, and 1.Each convolution operates on all layers: strictly speaking, these are 3×3×C convolutions with dilation in the first two dimensions.
The basic context module has 7 layers that apply 3×3 convolutions with different dilation factors. The dilations are 1, 1, 2, 4, 8, 16, and 1. Each convolution operates on all layers: strictly speaking, these are 3×3×C convolutions with dilation in the first two dimensions. Each of these convolutions is followed by a pointwise truncation max(·, 0). A final layer performs 1×1×C convolutions and produces the output of the module. The architecture is summarized in Table 1. Note that the frontend module that provides the input to the context network in our experiments produces feature maps at 64×64 resolution. We therefore stop the exponential expansion of the receptive field after layer 6.
Our initial attempts to train the context module failed to yield an improvement in prediction accuracy. Experiments revealed that standard initialization procedures do not readily support the training of the module. Convolutional networks are commonly initialized using samples from random distributions. However, we found that random initialization schemes were not effective for the context module. We found an alternative initialization with clear semantics to be much more effective:
where a is the index of the input feature map and b is the index of the output map. This is a form of identity initialization, which has recently been advocated for recurrent networks. This initialization sets all filters such that each layer simply passes the input directly to the next. A natural concern is that this initialization could put the network in a mode where backpropagation
cannot significantly improve the default behavior of simply passing information through. However, experiments indicate that this is not the case. Backpropagation reliably harvests the contextual information provided by the network to increase the accuracy of the processed maps.
This completes the presentation of the basic context network. Our experiments show that even this basic module can increase dense prediction accuracy both quantitatively and qualitatively. This is particularly notable given the small number of parameters in the network: ≈ 64C2 parameters in total.
We have also trained a larger context network that uses a larger number of feature maps in the deeper layers. The number of maps in the large network is summarized in Table 1. We generalize the initialization scheme to account for the difference in the number of feature maps in different layers. Let ci and ci+1 be the number of feature maps in two consecutive layers. Assume that C divides both ci and ci+1. The initialization is