NEURAL ARCHITECTURE SEARCH WITH

2019-10-21 本文已影响0人 FantDing

论文原文

Abstract

idea

RNN生成model descriptions
train this RNN with reinforcement learning

achievments

CIFAR-10 state-of-art
compose a novel cecurrent cell & 这样的cell是transferrable

Introduction

idea

本文基于这样的观察：网络的结构以及连接方式可以看成一个变长字符串，则有可能通过RNN来生成
在一个强化学习系统中
- RNN充当控制器的角色，用来生成字符串，也就是model
- model经过训练，在验证集上得到精度，看成Reward
- 通过算法更新控制器RNN, 来取得更高的reward，也就是validation accuracy

image

achivement

CV: cifar10 SOTA(3.65 test set error) & 1.05x faster
NLP: Penn Treebank dataset SOTA
搜索到一个novel reccurrent cell which is better than RNN and LSTM.

Related Work

前人工作缺点

只能搜索定长的model
Bayesian optimization methods 可以搜索不定长的结构，但是不够通用&不够灵活
Modern neuro-evolution[神经进化] algorithms 在结构搜索上灵活得多。但是在大规模的结构搜索上不实用。主要是因为他们都是search-based methods, 因此会很慢或者需要许多heuristics[启发式信息]才能奏效

相似工作

程序合成. 相当于自动生成程序
- program synthesizers typically perform some form of search over the space of programs to generate a
  program that is consistent with a variety of constraints (e.g. inputoutput examples, demonstrations, natural language, partial programs,
  and assertions).
end-to-end sequence to sequence learning。相似之处是: auto-regressive[自回归]
- 自回归: 不用x预测y，而是用x预测x（自己）
meta-learning. 相似之处在于：
- using a neural network to learn the gradient descent updates for another net- work
- using reinforcement learning to find update policies for another network

Methods

使用RNN Controller生成model descriptions

一个简单的RNN Controller示例

何时停止继续生成？
- 当layer超过一定数目之后便停止。这个数据随着训练过程而增加
如何更新RNN Controller的参数?
- 把expected validation accuracy[验证精度的期望]看成reward, 使用policy gradient method来更新参数 $\theta_c$

Training with reinforce

介绍如何更新RNN Controller的参数

公式推导

$J(\theta_c)=E_{P(a_1:T;\theta_c)}[R] \tag{1.1}$
关于(1.1)的解释，在相同的RNN Controller参数 $\theta_c$ 下，每个时步t下执行的动作 $a_t$ 是不同的(按照softmax采样的)，因此产生的网络结构 $a_{1:T}$ 也是不同的，得到的精度R也不同，因此需要求R的期望，即(1.1). 后续令 $a_{1:T}=\tau$ ，则：

$J(\theta_c)=E_{\tau \sim P_{\theta_c}}[R] =\int p_{\theta_c}(\tau)*R \tag{1.2}$
(1.2)对(1.1)重写，下面对 $\theta_c$ 求导：

image

下面对上式中 $\bigtriangledown_{\theta}logP_{\theta_c}(\tau)$ 进行计算：
因为：

image
所以：

image

带入到绿色公式可以得到，即为文章中
[站外图片上传中...(image-b8ba6d-1571643802815)]

并行异步更新

用来加速RNN Controller的学习。没搞懂，暂且放着^[1]

添加Skip Connections and Other Layer Types

添加Skip Connections和branching layer

方法：
- 每个layer添加一个anchor point, 则经过anchor point, RNN Controller有一个hidden state $h_i$
- 通过概率 $P(Lyaer\ j\ is\ an\ input\ to\ layer\ i)=sigmoid(v^Ttanh(W_{prev}*h_j+W_{curr}*h_i))$ ，采样j是否连接到i layer
- 其中， $W_{curr}$ , $W_{prev}$ , $v$ 是可以学习的参数
问题：
- layer可能没有输入：看成是input layer
- layer输出可能没被送到任何其他layers：都送到classifier
- layer有多个inputs，但是inputs尺寸不同，不能stack： pad zeros

添加其他类型layers

pooling, batchnorm, 甚至是Learning rate

RNN Controller首先预测layer type，在预测相关的hyperparameters

Generate RNN Cell Architectures

本节介绍如何用本文方法生成RNN Cell。即不仅对CNN结构搜索有效，对RNN也同样能够进行搜索

以“base 2” Tree为例，生成一个RNN Cell ^[2]
- “base 2”: the tree has two leaf nodes

右边是例子的计算图

实际使用“base 8”

实验和结果

CNN for cifar10

Dataset:
- whitening
- upsample and random crop 32*32
- random horizotal crop
search space:
- layer type: RELU, BN, Skip connections
- filter hight in [1,3,5,7]
- filter width in [1,3,5,7]
- number of filters in [24,36,48,64]
- stride: 有两套实验，一套固定为1；令一套search in [1,2,3]
训练细节(更多见论文)
- child model:
  - trained for 50 epochs
  - 最后5轮中，最大的cubed validation acc 看成reward
- RNN Controller
  - RNN会同时sample 800个child model并行分布地训练
  - 每sample 1600个child model, child model的depth变成2倍(depth初始为6)
结果：run a small grid search^[3] over learning rate, weight decay, batchnorm epsilon and what epoch to decay the learning rate. The best model from this grid search is then run until convergence and we then compute the test accuracy of such model
- v1: stride=1 and 没有Pooling
- v2: search stride in [1,2,3], 由于搜索空间变大，因此精度略微下降
- v3: layer 13 and layer 24为max pooling
- v4:
  - To limit the search space complexity we have our model predict 13 layers where each layer prediction is a fully connected block of 3 layers^[4]
  - filter number search from [24, 36, 48, 64] to [6, 12, 24, 36]
  - adding 40 filters to each layer

搜索到的4个网络

RNN for PENN TREEBANK

RNN部分暂且不看

角注

待看 ↩
待看 ↩
grid search怎么做？ ↩
what ↩

NEURAL ARCHITECTURE SEARCH WITH

Abstract

idea

achievments

Introduction

idea

achivement

Related Work

前人工作缺点

相似工作

Methods

使用RNN Controller生成model descriptions

Training with reinforce

公式推导

并行异步更新

添加Skip Connections and Other Layer Types

添加Skip Connections和branching layer

添加其他类型layers

Generate RNN Cell Architectures

实验和结果

CNN for cifar10

RNN for PENN TREEBANK

角注

猜你喜欢

热点阅读