2018-08-10 CNN-convoutional

2019-02-18 本文已影响0人镜中无我

compared to fully-connected neural networks, convolutional ones performs better in image recognition due to the different structures between adjacent layers

structure:

input: the primitive pixels of image which are 3-D matrice
output: the confidence of classification

input layer: pixel matrice with RGB depth
convolutional layer: multiple nodes with input of a block in past layer,which means to extracting deeper feature
pooling: won't change the depth of last layer but will tighten the scale
fully-connected: for classifying by feeding the given features
softmax: for classifying via obtaining the probability of every class

convolutional layer(filter or kernel)

processed block is with the same scale as filter's
filter depth
output matrice scale :
out_length=(in_length-fil_length+1)/stride_length
out_width=(in_width-fil_width+1)/stride_width
filter_parameter_amounts=fil_widthfil_length_in_depthfil_depth

fil_weights=tf.get_variable('weigths',[fil_length,fil_width,in_depth,fil_depth],initializer=tf...)
biases=tf.get_variable('biases',[fil_depth],initializer=tf...)
#conv2d is used for forward-prob.
conv=tf.nn.conv2d(input,filter_weight,strides=[1,len_stride,wid_stride,1], padding='SAME') #'VAILD' means no zeros-
 padding
#bias_add is used for adding biases,note do not directly add
bias=tf.nn.bias_add(conv,bias)
activation=tf.nn.relu(bias)

pooling layer

usage: make the scale shrink to enhance the computational speed and avoid over-fitting

max pooling

simply maximizing

#similarly to convolutional operation, you have to set the strides and padding,but stride in depth is valid for pooling #compared with convolutional layer. ksize is the scale of filter 
pool=tf.nn.max_pool(activation,ksize=[1,fil_len,fil-wid,1], stride=[1,len_stride,wid_stride,1], padding ='SAME')

average pooling

classical models

LeNet-5

first layer: convolutional layer
input: 32321
filter:556，padding='VALID'
stride: [1,1,1,1]
output:28286
second layer: pooling
input : output of convolutional layer
filter:[1,2,2,1]
stride:[1,2,2,1]
output:14146
third layer：convoluted layer
input： output of last layer
filter:5516,padding='VALID'
stride:[1,1,1,1]
output: 101016
fourth layer: pooling layer
input： output of last layer
filter:22
stride:[1,2,2,1]
output: 55*16
fifth layer : fully-connected(similar to convoluted layer)
input： output of last layer
filter:55
output：120
para.:5516120+120
sixth layer: fully-connected
input： output of last layer
output: 84
para.:120*84+84
seventh layer: fully-connected
input： output of last layer
output: 10
para.:84*10+10

xs=tf.palceholder(tf.float32,[Batch_size,mnist_inference.IMAGE_SIZE,mnist_inference.IMAGE_SIZE,mnist_inference.NUM_CHANNELS],name='x-input')
reshaped_xs=np.reshape(xs,(Batch_size,mnist_inference.IMAGE_SIZE,mnist_inference.IMAGE_SIZE,mnist_inference.NUM_CHANNELS))
def inference(tensor,train,regularizer)
     with tf.variable_scope('layer1-conv1'):
            conv1_weights=tf.get_variable('weights',[CONV1_SIZE,CONV1_SIZE,NUM_CHANNELS,CONV1_DEEP],initializer=...)
            conv1_biases=tf...
            conv1=tf.nn.conv2d(...)
            relu1=tf.nn.relu(tf.nn.bias_add(...))
     with tf.name_scope('layer2-pool1'):
            pool1=tf.nn.max_pool(relu1,ksize=...,strides=...,padding=...)
     with tf.variable_scope(layer3-conv2):
            ...
     with tf.name_scope(...):
            pool2=...
     # reshape the data form to prepare for next fully-connected layer
     pool_shape=pool2.get_shape().as_list()
     nodes=pool_shape[1]*pool_shape[2]*pool_shape[3]
     reshaped=tf.reshape(pool2,pool_shape[0],nodes])
     with tf.variable_scope('layer5-fc1'):
            fc1_weights=tf.get_variable('weights',[nodes,FC_SIZE],initializer=...)
            if regularizer!=None:
                tf.add_to_collection('losses',regularizer(fc1_weights))
            fc1_biases=tf.get_variable('bias',[FC_SIZE],initializer=...)
            fc1=tf.nn.relu(...)
            if train:fc1=tf.nn.dropout(fc1,0.5)
     with ...
            ...
            logit=tf.matnul(fc1,fc2_weights)+fc2_biases
     return logit

note:input->(convoluted+->pooling?)->fully-connected->softmax->out

Inception-v3

core method
in convoluted layer, three different kernels are provided to simultaneously process the input and then accumulate them together
for that,we set the stride to 1 and padding to 'SAME'

# predetermine  the para. of some methods
with slim.arg_scope([slim.conv2d,slim.max_pool2d,slim.avg_pool2d],stride=1,padding='SAME'):
       # inception module namespace
       with tf.variable_scope('...'):
              #for every path
              with tf.variable_scope('...1'):
              with tf.variable_scope('...2'):
              with tf.variable_scope('...3'):
       net=tf.concat(3,[...1,...2,...3])