神经网络与深度学习

ResNeXt里组卷积是什么?

2019-12-29  本文已影响0人  Jinglever

转载请注明出处:https://www.jianshu.com/p/328b29d20403 如果觉得有用,麻烦点个赞噢~

ResNeXt里,用到了一个结构叫组卷积,如下图所示:

blocks of ResNeXt

在Pytorch的卷积函数(如:Conv2d)里,有一个参数叫group,它实现的就是组卷积逻辑。下面看官方对这个参数的解释:

  • :attr:groups controls the connections between inputs and outputs.
    :attr:in_channels and :attr:out_channels must both be divisible by
    :attr:groups. For example,
    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv
    layers side by side, each seeing half the input channels,
    and producing half the output channels, and both subsequently
    concatenated.
    * At groups= :attr:in_channels, each input channel is convolved with
    its own set of filters, of size:
    :math:\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor.

大概意思就是:将输入的通道分组,每组都跟各自的卷积核做卷积计算,计算结果拼接在一起作为输出。其中,每个分组的卷积核channel数等于out_channels // groups

ResNeXt的语境里,组卷积的group数就是cardinalities C,而每个分组的卷积核的通道数是width of bottleneck d,组卷积输出的channel数就是width of group conv。见下图:

(out_channels = C * d)

再来看论文里展示的ResNeXt-50(32x4d)的网络结构:

可见,ResNeXt的几个bottleneck的卷积核channel数(128 -> 256 -> 512 -> 1024)的递增倍数跟ResNet的(64 -> 128 -> 256 -> 512)一致。在代码里要怎么写呢?

参考torchvision里resnet模型的源码,可以看到有这样的式子:

 width = int(planes * (base_width / 64.)) * groups

width是作为out_channels传给bottleneck内的卷积计算的。乍一看是去,不太好理解这个式子。我改成下面的写法:

width = int(base_width * groups * (planes/64.))

意思是否更加明确些了呢?对于第一个bottleneck,ResNeXt-50(32x4d)的卷积核channels等于128,即width = base_width * groups = 4 * 32 = 128。从第二个bottleneck开始,width要成倍递增,倍数跟ResNet-50的相同,即plaines/64.,于是就有了上面计算width的式子。

上一篇下一篇

猜你喜欢

热点阅读