实现常见CNN网络结构中添加注意力(attention)机制
目前常见网络结构有许多,例如:
- LeNet:基于渐变的学习应用于文档识别
- AlexNet:具有深卷积神经网络的ImageNet分类
- VGGNet:用于大规模图像识别的非常深的卷积网络
- GoogLeNet:卷入更深入
- Inception-v3:重新思考计算机视觉的初始架构
- ResNet:图像识别的深度残差学习
- Inception-ResNet:Inception-v4,inception-resnet以及剩余连接对学习的影响
- SqueezeNet:AlexNet级准确度,参数减少50倍,模型尺寸小于0.5MB
- MobileNets:用于移动视觉应用的高效卷积神经网络
- ShuffleNet:移动设备极高效的卷积神经网络
本次以mini_XCEPTION网络为例:
def mini_XCEPTION(input_shape, num_classes, l2_regularization=0.01):
regularization = l2(l2_regularization)
# base
img_input = Input(input_shape)
x = Conv2D(8, (3, 3), strides=(1, 1), kernel_regularizer=regularization,
use_bias=False)(img_input)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(8, (3, 3), strides=(1, 1), kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
# module 1
residual = Conv2D(16, (1, 1), strides=(2, 2),
padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(16, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(16, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])
# module 2
residual = Conv2D(32, (1, 1), strides=(2, 2),
padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(32, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(32, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])
# module 3
residual = Conv2D(64, (1, 1), strides=(2, 2),
padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(64, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(64, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])
# module 4
residual = Conv2D(128, (1, 1), strides=(2, 2),
padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
x = SeparableConv2D(128, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = SeparableConv2D(128, (3, 3), padding='same',
kernel_regularizer=regularization,
use_bias=False)(x)
x = BatchNormalization()(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = layers.add([x, residual])
x = Conv2D(num_classes, (3, 3),
#kernel_regularizer=regularization,
padding='same')(x)
x = GlobalAveragePooling2D()(x)
output = Activation('softmax',name='predictions')(x)
model = Model(img_input, output)
return model
首先,注意力机制,在ECCV2018的一篇论文中,提出了CBAM(Convolutional Block Attention Module),想看原论文的道友可以点击这里下载。该篇论文不仅在原有通道注意力(channel attention)机制上进行改进,而且还增加空间注意力(spatial attention)机制。如下图所示。
CBAM这篇论文的贡献点主要有以下三点:
(1) 提出了一个高效的attention模块—-CBAM,该模块能够嵌入到目前的主流CNN网络结构中。
(2) 通过额外的分离实验证明了CBAM中attention的有效性。
(3) 在多个平台上(ImageNet-1K,MS COCO和VOC 2007)上证明了CBAM的性能提升。
-
通道注意力(channel attention)
该部分的工作与SENet很相似,都是首先将feature map在spatial维度上进行压缩,得到一个一维矢量以后再进行操作。与SENet不同之处在于,对输入feature map进行spatial维度压缩时,作者不单单考虑了average pooling,额外引入max pooling作为补充,通过两个pooling函数以后总共可以得到两个一维矢量。global average pooling对feature map上的每一个像素点都有反馈,而global max pooling在进行梯度反向传播计算只有feature map中响应最大的地方有梯度的反馈,能作为GAP的一个补充。公式如下:
公式-
空间注意力(spatial attention)
这部分工作是论文跟SENet区别开来的一个重要贡献,除了在channel上生成了attention模型,作者表示在spatial层面上也需要网络能明白feature map中哪些部分应该有更高的响应。首先,还是使用average pooling和max pooling对输入feature map进行压缩操作,只不过这里的压缩变成了通道层面上的压缩,对输入特征分别在通道维度上做了mean和max操作。最后得到了两个二维的feature,将其按通道维度拼接在一起得到一个通道数为2的feature map,之后使用一个包含单个卷积核的隐藏层对其进行卷积操作,要保证最后得到的feature在spatial维度上与输入的feature map一致.
公式-
两个机制的连接方式
论文中已经有相关实验证明,作者考虑了三种情况:channel-first、spatial-first和parall的方式,可以看到,channel-first能取得更好的分类结果。
实验结果实现代码部分:
from tensorflow.keras import backend as K
from tensorflow.keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, Reshape, Dense, multiply, Permute, Concatenate, Conv2D, Add, Activation, Lambda
''' 通道注意力机制:
对输入feature map进行spatial维度压缩时,作者不单单考虑了average pooling,
额外引入max pooling作为补充,通过两个pooling函数以后总共可以得到两个一维矢量。
global average pooling对feature map上的每一个像素点都有反馈,而global max pooling
在进行梯度反向传播计算只有feature map中响应最大的地方有梯度的反馈,能作为GAP的一个补充。
'''
def channel_attention(input_feature, ratio=8):
channel_axis = 1 if K.image_data_format() == "channels_first" else -1
channel = input_feature.shape[channel_axis]
shared_layer_one = Dense(channel // ratio,
kernel_initializer='he_normal',
activation='relu',
use_bias=True,
bias_initializer='zeros')
shared_layer_two = Dense(channel,
kernel_initializer='he_normal',
use_bias=True,
bias_initializer='zeros')
avg_pool = GlobalAveragePooling2D()(input_feature)
avg_pool = Reshape((1, 1, channel))(avg_pool)
assert avg_pool.shape[1:] == (1, 1, channel)
avg_pool = shared_layer_one(avg_pool)
assert avg_pool.shape[1:] == (1, 1, channel // ratio)
avg_pool = shared_layer_two(avg_pool)
assert avg_pool.shape[1:] == (1, 1, channel)
max_pool = GlobalMaxPooling2D()(input_feature)
max_pool = Reshape((1, 1, channel))(max_pool)
assert max_pool.shape[1:] == (1, 1, channel)
max_pool = shared_layer_one(max_pool)
assert max_pool.shape[1:] == (1, 1, channel // ratio)
max_pool = shared_layer_two(max_pool)
assert max_pool.shape[1:] == (1, 1, channel)
cbam_feature = Add()([avg_pool, max_pool])
cbam_feature = Activation('hard_sigmoid')(cbam_feature)
if K.image_data_format() == "channels_first":
cbam_feature = Permute((3, 1, 2))(cbam_feature)
return multiply([input_feature, cbam_feature])
''' 空间注意力机制:
还是使用average pooling和max pooling对输入feature map进行压缩操作,
只不过这里的压缩变成了通道层面上的压缩,对输入特征分别在通道维度上做了
mean和max操作。最后得到了两个二维的feature,将其按通道维度拼接在一起
得到一个通道数为2的feature map,之后使用一个包含单个卷积核的隐藏层对
其进行卷积操作,要保证最后得到的feature在spatial维度上与输入的feature map一致,
'''
def spatial_attention(input_feature):
kernel_size = 7
if K.image_data_format() == "channels_first":
channel = input_feature.shape[1]
cbam_feature = Permute((2, 3, 1))(input_feature)
else:
channel = input_feature.shape[-1]
cbam_feature = input_feature
avg_pool = Lambda(lambda x: K.mean(x, axis=3, keepdims=True))(cbam_feature)
assert avg_pool.shape[-1] == 1
max_pool = Lambda(lambda x: K.max(x, axis=3, keepdims=True))(cbam_feature)
assert max_pool.shape[-1] == 1
concat = Concatenate(axis=3)([avg_pool, max_pool])
assert concat.shape[-1] == 2
cbam_feature = Conv2D(filters=1,
kernel_size=kernel_size,
activation='hard_sigmoid',
strides=1,
padding='same',
kernel_initializer='he_normal',
use_bias=False)(concat)
assert cbam_feature.shape[-1] == 1
if K.image_data_format() == "channels_first":
cbam_feature = Permute((3, 1, 2))(cbam_feature)
return multiply([input_feature, cbam_feature])
def cbam_block(cbam_feature, ratio=8):
"""Contains the implementation of Convolutional Block Attention Module(CBAM) block.
As described in https://arxiv.org/abs/1807.06521.
"""
# 实验验证先通道后空间的方式比先空间后通道或者通道空间并行的方式效果更佳
cbam_feature = channel_attention(cbam_feature, ratio)
cbam_feature = spatial_attention(cbam_feature, )
return cbam_feature
在你想要添加的网络部位添加,例如在第二残差块添加。示意图为论文中所提供:
image.png # module 2
residual = Conv2D(32, (1, 1), strides=(2, 2),
padding='same', use_bias=False)(x)
residual = BatchNormalization()(residual)
后添加代码:
cbam = cbam_block(residual)
并将
x = layers.add([x, residual])
改成:
x = layers.add([x, residual, cbam])
当然不一定必须要add()方法,还可以运用concatenate()等方法。
CBAM与SE Module一样,可以嵌入了目前大部分主流网络中,在不显著增加计算量和参数量的前提下能提升网络模型的特征提取能力。总之,在网络结构中添加attention也不失是一种好的选择。