Python 3 & Keras 实现Mobilenet

2018-02-05 本文已影响473人 Daisy丶

MobileNet是Google提出来的移动端分类网络。在V1中，MobileNet应用了深度可分离卷积(Depth-wise Seperable Convolution)并提出两个超参来控制网络容量，这种卷积背后的假设是跨channel相关性和跨spatial相关性的解耦。深度可分离卷积能够节省参数量省，在保持移动端可接受的模型复杂性的基础上达到了相当的高精度。而在V2中，MobileNet应用了新的单元：Inverted residual with linear bottleneck，主要的改动是为Bottleneck添加了linear激活输出以及将残差网络的skip-connection结构转移到低维Bottleneck层。

Paper：Inverted Residuals and Linear Bottlenecks Mobile Networks for Classification, Detection and Segmentation
Github：https://github.com/xiaochus/MobileNetV2

网络结构

MobileNetV2的整体结构如下图所示。每行描述一个或多个相同（步长）层的序列，每个bottleneck重复n次。相同序列中的所有层具有相同数量的输出通道。每个序列的第一层有使用步长s，所有其他层使用步长1。所有的空间卷积使用3 * 3的内核。扩展因子t始终应用于输入大小。假设输入某一层的tensor的通道数为k，那么应用在这一层上的filters数就为 k * t。

net.jpg

Bottleneck的结构如下所示，根据使用的步长大小来决定是否使用skip-connection结构。

stru.jpg

环境

OpenCV 3.4
Python 3.5
Tensorflow-gpu 1.2.0
Keras 2.1.3

实现

基于论文给出的参数，我使用Keras 2实现了网络结构，如下所示：

from keras.models import Model
from keras.layers import Input, Conv2D, GlobalAveragePooling2D, Dropout
from keras.layers import Activation, BatchNormalization, add, Reshape
from keras.applications.mobilenet import relu6, DepthwiseConv2D
from keras.utils.vis_utils import plot_model

from keras import backend as K


def _conv_block(inputs, filters, kernel, strides):
    """Convolution Block
    This function defines a 2D convolution operation with BN and relu6.
    # Arguments
        inputs: Tensor, input tensor of conv layer.
        filters: Integer, the dimensionality of the output space.
        kernel: An integer or tuple/list of 2 integers, specifying the
            width and height of the 2D convolution window.
        strides: An integer or tuple/list of 2 integers,
            specifying the strides of the convolution along the width and height.
            Can be a single integer to specify the same value for
            all spatial dimensions.
    # Returns
        Output tensor.
    """

    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1

    x = Conv2D(filters, kernel, padding='same', strides=strides)(inputs)
    x = BatchNormalization(axis=channel_axis)(x)
    return Activation(relu6)(x)


def _bottleneck(inputs, filters, kernel, t, s, r=False):
    """Bottleneck
    This function defines a basic bottleneck structure.
    # Arguments
        inputs: Tensor, input tensor of conv layer.
        filters: Integer, the dimensionality of the output space.
        kernel: An integer or tuple/list of 2 integers, specifying the
            width and height of the 2D convolution window.
        t: Integer, expansion factor.
            t is always applied to the input size.
        s: An integer or tuple/list of 2 integers,specifying the strides
            of the convolution along the width and height.Can be a single
            integer to specify the same value for all spatial dimensions.
        r: Boolean, Whether to use the residuals.
    # Returns
        Output tensor.
    """

    channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
    tchannel = K.int_shape(inputs)[channel_axis] * t

    x = _conv_block(inputs, tchannel, (1, 1), (1, 1))

    x = DepthwiseConv2D(kernel, strides=(s, s), depth_multiplier=1, padding='same')(x)
    x = BatchNormalization(axis=channel_axis)(x)
    x = Activation(relu6)(x)

    x = Conv2D(filters, (1, 1), strides=(1, 1), padding='same')(x)
    x = BatchNormalization(axis=channel_axis)(x)

    if r:
        x = add([x, inputs])
    return x


def _inverted_residual_block(inputs, filters, kernel, t, strides, n):
    """Inverted Residual Block
    This function defines a sequence of 1 or more identical layers.
    # Arguments
        inputs: Tensor, input tensor of conv layer.
        filters: Integer, the dimensionality of the output space.
        kernel: An integer or tuple/list of 2 integers, specifying the
            width and height of the 2D convolution window.
        t: Integer, expansion factor.
            t is always applied to the input size.
        s: An integer or tuple/list of 2 integers,specifying the strides
            of the convolution along the width and height.Can be a single
            integer to specify the same value for all spatial dimensions.
        n: Integer, layer repeat times.
    # Returns
        Output tensor.
    """

    x = _bottleneck(inputs, filters, kernel, t, strides)

    for i in range(1, n):
        x = _bottleneck(x, filters, kernel, t, 1, True)

    return x


def MobileNetv2(input_shape, k):
    """MobileNetv2
    This function defines a MobileNetv2 architectures.
    # Arguments
        input_shape: An integer or tuple/list of 3 integers, shape
            of input tensor.
        k: Integer, layer repeat times.
    # Returns
        MobileNetv2 model.
    """

    inputs = Input(shape=input_shape)
    x = _conv_block(inputs, 32, (3, 3), strides=(2, 2))

    x = _inverted_residual_block(x, 16, (3, 3), t=1, strides=1, n=1)
    x = _inverted_residual_block(x, 24, (3, 3), t=6, strides=2, n=2)
    x = _inverted_residual_block(x, 32, (3, 3), t=6, strides=2, n=3)
    x = _inverted_residual_block(x, 64, (3, 3), t=6, strides=2, n=4)
    x = _inverted_residual_block(x, 96, (3, 3), t=6, strides=1, n=3)
    x = _inverted_residual_block(x, 160, (3, 3), t=6, strides=2, n=3)
    x = _inverted_residual_block(x, 320, (3, 3), t=6, strides=1, n=1)

    x = _conv_block(x, 1280, (1, 1), strides=(1, 1))
    x = GlobalAveragePooling2D()(x)
    x = Reshape((1, 1, 1280))(x)
    x = Dropout(0.3, name='Dropout')(x)
    x = Conv2D(k, (1, 1), padding='same')(x)

    x = Activation('softmax', name='softmax')(x)
    output = Reshape((k,))(x)

    model = Model(inputs, output)
    plot_model(model, to_file='images/MobileNetv2.png', show_shapes=True)

    return model


if __name__ == '__main__':
    MobileNetv2((224, 224, 3), 1000)

训练

论文中推荐的输入大小为 224 * 224，因此训练集最好使用同样的大小. data\convert.py 文件提供了将cifar-100数据放大为224的例子.

训练数据集应该按照以下的格式配置:

| - data/
    | - train/
        | - class 0/
            | - image.jpg
                ....
        | - class 1/
          ....
        | - class n/
    | - validation/
        | - class 0/
        | - class 1/
          ....
        | - class n/

运行下面的命令来训练模型:

python train.py --classes num_classes --batch batch_size --epochs epochs --size image_size

训练好的 .h5 权重文件保存在model文件夹.。如果想要在已有的模型上进行微调，可以使用下面的命令。但是需要注意，只能够改变最后一层输出的类别的个数，其他层的结构应该保持一致。

python train.py --classes num_classes --batch batch_size --epochs epochs --size image_size --weights weights_path --tclasses pre_classes

参数

--classes, 当前训练集的类别数。
--size, 图像大小。
--batch, batch size。
--epochs, epochs。
--weights, 需要fine tune的模型。
--tclasses, 训练好的模型中输出的类别数。

实验

由于条件限制，我们使用cifar-100数据库，在一定大小的epochs下进行实验。

device: Tesla K80
dataset: cifar-100
optimizer: Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)  
batch_szie: 128

实验细节如下，尽管网络没有完全收敛，但依然取得了不错的准确率。

Metrics	Loss	Top-1 Accuracy	Top-5 Accuracy
cifar-100	0.195	94.42%	99.82%

eva.png

Python 3 & Keras 实现Mobilenet

网络结构

环境

实现

训练

实验

猜你喜欢

热点阅读