EfficientNet论文阅读

2019-10-17 本文已影响0人 FantDing

论文原文

Abstract

Introduction

scale up:
- 常规：increase depth, width, resolution.
- new method: compound scaling method
baseline network:
- architecture search: EfficientNet-B0

Related Work

ConvNet Accuracy

在精度方面的state-of-art

GoogleNet
SENet
GPipe

ConvNet Efficiency

几种efficient手段

model compression
手动设计的 mobile-size的卷积网络
architecture search的mobile sizes卷积网络

这几种efficient手段无法应用于large model(larger design space, expensive tuning cost)

Model scaling

depth: ResNets
Width: WideResNet
- width: 指channels
input image size

虽然这3种model scaling方法能增加精度，但是没能说明how to effectively scale to trade off efficiency and accuracy

Compound Model Scaling

Problem Formulation

怎么看这一小节都没什么用

一个卷积层可以形式化成: $Y_i=F_i(X_i)$ ， $Y_i$ 是输出tensor; $X_i$ 是输入tensor, with shape $<H_i,W_i,C_i>$
则一个model可以写成公式: $N=F_k \odot...\odot F_2 \odot F_1(X_1)=\odot_{j=1...k}F_j(X_1)$
而实际中，模型是分为多个stage的，每个stage中卷积类型是相同的，因此网络又可以定义为: $N=\odot_{i=1...s}F_i^{L_i}(X_{<H_i,W_i,C_i>})$ , 其中 $F_i^{L_i}$ 代表，在stage $i$ 中， layer $F_i$ 重复了 $L_i$ 次
简化问题
- step1: 不关注find best layer architecture $F_i$ , 而是在预定义的baseline network上，搜索 $L_i, C_i, H_i, W_i$
- step2: 即使这样 $L_i, C_i, H_i, W_i$ 的搜索空间仍然很大，为此,约束所有layer的scale比例都是一个常数(不同维度的比例不同，不同layer的比例相同)
- 因此得到一下优化问题：

优化问题

Scaling Dimensions

单一维度scale

Depth(d):

好处:
- deeper net可以capture richer and complex features
- generalize well
弊端
- 越深的网络越难收敛
- 网络加深到一定程度,精度提高有限

Width(w):

好处：
- capture more fine-grained features
- easier to train
弊端
- wide shallow net不易capture hign level features

Resolution(r):

好处:
- 确实可以提高精度
弊端
- 使用very high resolutions精度提高不大

看图总结：
单独scale up每个维度都能提高精度，但是对于更大点的模型，这种"Accuracy gain"的利好便不再有了

image

Compound Scaling

三个维度都scale up

经验得知，不同维度之间的scale up是相互影响的。如输入higher resolution images，为了增加感受野范围，需要增加网络depth；为了capture fine-grained patterns，需要增加width

如下图所示，使用相同的baseline network

蓝色: 在depth和resolution不变的情况下，不断增大w的值
红色: depth变成原来两倍，resolution变成原来1.3倍，再不断增大w

image

看图结论：
在FLOPS相同的情况下，to pursue better accuracy and efficiency ,it is citical to balance all dimensions

compound scaling method

作者提出了一种scale原则，如下图：
[站外图片上传中...(image-b36cca-1571643636142)]

$\alpha, \beta, \gamma$ ：通过grid search得到
$\phi$ : compound coefficient。是用户依据可利用资源数量手动给定的系数
通过上述两步就可以确定,depth,width,resolution的伸缩因子了

FLOPS

float per second

对于一个卷积操作,FLOPS与 $d$ , $w^2$ , $r^2$ 是成比例的^[1]。如， $d=2$ , 即depth变成原来两倍,FLOPS也会变成原来2倍； $w$ or $r$ 变成原来2倍，FLOPS将会变成原来四倍。 $FLOPS增加的倍数=d*w^2*r^2$ , 如果写成关于 $\phi$ , 则有 $FLOPS增加的倍数=\alpha^\phi*(\beta^\phi)^2*(\gamma^\phi)^2=(\alpha*\beta^2*\gamma^2)^\phi=2^\phi$

其中有一些是“约等于”的关系：

网络总的FLOPS，约等于总的卷积操作FLOPS
(3)式的等式约束是约等于的

EfficientNet architecture

EfficientNet-B0

B0网络是baseline network

是通过neural architecture search技术搜索^[2] 出来的
mobile-sized

efficientNet-b0网络结构图

[站外图片上传中...(image-d9c2b1-1571643636142)]

如何进行scale up

step1: 令 $\phi=1$ , 通过small grid search找到最优 $\alpha, \beta, \gamma$ . “最优”是指的是 $ACC(model)$ 最大，搜索出来的 $\alpha=1.2, \beta=1.1, \gamma=1.15$
step2: 固定 $\alpha, \beta, \gamma$ ，增大 $\phi$ ,得到新的 $d, w, r$ .从而得到EfficientNet-B1到B7

其实正常的做法是，先令 $\phi=1$ ,进行一次搜索得到 $\alpha, \beta, \gamma$ ；再令 $\phi=2$ ,搜索一次 $\alpha, \beta, \gamma$ ；... 但是为了减少搜索的代价，作者使用了上述的简便方式

实验

Scaling up MobileNets and ResNets

在MobileNets和ResNets上比较两种scaling方法，说明了compound scaling比single-dimension scaling好^[3]

image

ImageNet Results for EfficientNet

训练细节

bigger models need more regularization.因此大模型的dropout要增大
what
- norm momentum
- swish activation
- fixed AutoAugment policy
- stochastic depth

性能对比

TOP-1 ACC
TOP-5 ACC
parameters
FLOPS

Latancy

为了说明real hardware上真实有效，又做了inference latency实验对比
[站外图片上传中...(image-77c9b6-1571643636142)]

Transfer Learning Results for EfficientNet

在其他8个数据集上比较，有5个数据集都做到了state-of-art, but magnitude fewer parameters

Discussion

为了说明 compound scaling比single-dimension scaling好，作者又在B0上，做了不同scaling的比较实验。compound scaling能有2.5%的精度提升

scaling up EfficientNet-B0

why better

通过activation map可视化^[4]发现compound scaling method能够让模型关注more relevant regions with more object details

different scaling method at the same baseline model

参考文章

google官方博文

角注

why FLOPS与width, resolution是平方的关系
- FLOPS计算见此文
↩
neural architecture search
- Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M.,
  Howard, A., and Le, Q. V. MnasNet: Platform-aware
  neural architecture search for mobile. CVPR, 2019.
- MnasNet
- MBConv
- squeeze-and-excitation optimization
↩
为什么不搜索一个最好的w or d来比较,而是随便使用了2、4之类的来比较 ↩
activation map可视化: 《Learning deep features for discriminative localization》

CAM生成方式 ↩

EfficientNet论文阅读

Abstract

Introduction

Related Work

ConvNet Accuracy

ConvNet Efficiency

Model scaling

Compound Model Scaling

Problem Formulation

Scaling Dimensions

Depth(d):

Width(w):

Resolution(r):

Compound Scaling

compound scaling method

FLOPS

EfficientNet architecture

EfficientNet-B0

如何进行scale up

实验

Scaling up MobileNets and ResNets

ImageNet Results for EfficientNet

训练细节

性能对比

Latancy

Transfer Learning Results for EfficientNet

Discussion

why better

参考文章

角注

猜你喜欢

热点阅读