Decode "Residual Network"???
"When I study Residual Network
, it has made enormous confusion for me, therefore it is needed to make a memo post for later review as well as beginners' skip connection
Basic idea
(wikipedia quots)A residual neural network(ResNet) is an artificial neural network of a kind that builds on constructs known from
pyramidal cells(锥体细胞 )
in thecerebral cortex(大脑皮层)
The Residual neural networks do this by utilizing so calledskip connections
to jump over some layers in order to avoid the problem of vanishing gradients and training degradation. -
There are two types of
Neural networks
: Plain Networks and Deeper Networks.-
The Plain Networks often contain layers at most smaller than 25 with accuracy roughly as good as it should be, i.e. the result is not too bad and there is going to more improvement.
The Deeper Networks often contain layers more than 25+ and people often think
more deeper layer more better the accuracy
, in reality it is often wrong because the deeper layers is the higher risk of vanishing or exploding gradient will be. Even if you add theregularization
to save the whole network from it, there is also things calleddegradation
Vanishing or exploding gradient(it is trivial to explain it so we omit this part)
Degradation - This problem has been observed while training deeper neural networks, as we increase the network depth, accuracy gets saturated which is expected as more complex layers of the network to model all the intricacies of the data. Overtime, there will come a time, as we increase the layers of the network further(after the saturation region), the accuracy of the network
. We can think of this happened due tooverfitting
, but actually it is not, additional layers in a deep model lead to higher training errors(training not testing
It is hard to grasp at the beginning since thecommon sense
of training a neural network is to make deeper layers in order to achieve a higher accuracy, we often thinkmore input more better result
So in order to avoid all those type of problems, people find out residual network which has been proved a good solution for deeper neural networks(25+ layers).
comparison between plain and resnet
- Intuition behind ResNet
- What is residual - A residual is the error in a result, for example, find out someone's age, if the actual age is 20 and you guessed 18, 2 is off from the right answer and it is the residua. In essence, residual is what you should have added to your prediction to match the actual data. It is important to realize that when residual is
, we don't do anything since the prediction already matches the actual data.
In the diagram, x is our prediction and we want it to be equal to the Actual. However, if is it off by a margin, our residual function residual() will kick in and produce the residual of the operation so as to correct our prediction to match the actual. If x == Actual, residual(x) will be 0. The Identity function just copies x.
- How
We want to go deeper without degradation in accuracy and error rate. We can do this via injecting identity mappings.
We want to be able to learn the residuals so that our predictions are close to the actuals.
Shortcut connections are those skipping one or more layers. In our case, the shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers. Identity shortcut connections add neither extra parameter nor computational complexity. The entire network can still be trained end-to-end by SGD with backpropagation, and can be easily implemented using common libraries without modifying the solvers.
H(x) = F(x) + x, where F(x) = W2 * relu(W1 * x + b1) + b2
During training period, the residual network learns the weights of its layers such that if the identity mapping were optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no corrections need to be made. Hence these become your identity mappings which help grow the network deep. And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it. Think of F(x) as learning how to adjust our predictions to match the actuals.
- Intuition behind ResNet
Deep residual networks works well due to the flow of information from the very first layer to the last layer of the network. By formulating residual functions as identity mappings, information is able to flow unimpeded throughout the entire network. This allows any layer to be represented as a function of the original input. Using pre-activation resnets by placing batch normalization and relu before the convolution, the output of the addition becomes the output of the layer, this achieves the identity effect we desire.


- Take a close look of the
residual block
The main take away here is to make thea[l+2] == relu(a[l])
therefore, the gradients at every single layer could be computed with the original input taking into consideration. Given the above equation, when G and H are identity functions, information would always flow unimpeded and gradients would never vanish no matter how deep we go.
PS-3 ResNet code example
#import needed classes
import keras
from keras.datasets import cifar10
from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,AveragePooling2D,Dropout,BatchNormalization,Activation
from keras.models import Model,Input
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler
from keras.callbacks import ModelCheckpoint
from math import ceil
import os
from keras.preprocessing.image import ImageDataGenerator
def Unit(x,filters,pool=False):
res = x
if pool:
x = MaxPooling2D(pool_size=(2, 2))(x)
res = Conv2D(filters=filters,kernel_size=[1,1],strides=(2,2),padding="same")(res)
out = BatchNormalization()(x)
out = Activation("relu")(out)
out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
out = BatchNormalization()(out)
out = Activation("relu")(out)
out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
out = keras.layers.add([res,out])
return out
#Define the model
def MiniModel(input_shape):
images = Input(input_shape)
net = Conv2D(filters=32, kernel_size=[3, 3], strides=[1, 1], padding="same")(images)
net = Unit(net,32)
net = Unit(net,32)
net = Unit(net,32)
net = Unit(net,64,pool=True)
net = Unit(net,64)
net = Unit(net,64)
net = Unit(net,128,pool=True)
net = Unit(net,128)
net = Unit(net,128)
net = Unit(net, 256,pool=True)
net = Unit(net, 256)
net = Unit(net, 256)
net = BatchNormalization()(net)
net = Activation("relu")(net)
net = Dropout(0.25)(net)
net = AveragePooling2D(pool_size=(4,4))(net)
net = Flatten()(net)
net = Dense(units=10,activation="softmax")(net)
model = Model(inputs=images,outputs=net)
return model
#load the cifar10 dataset
(train_x, train_y) , (test_x, test_y) = cifar10.load_data()
#normalize the data
train_x = train_x.astype('float32') / 255
test_x = test_x.astype('float32') / 255
#Subtract the mean image from both train and test set
train_x = train_x - train_x.mean()
test_x = test_x - test_x.mean()
#Divide by the standard deviation
train_x = train_x / train_x.std(axis=0)
test_x = test_x / test_x.std(axis=0)
# Generate batches of tensor image data with real-time data augmentation.
# The data will be looped over (in batches).
datagen = ImageDataGenerator(rotation_range=10,
width_shift_range=5. / 32,
height_shift_range=5. / 32,
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
#Encode the labels to vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)
#define a common unit
input_shape = (32,32,3)
model = MiniModel(input_shape)
#Print a Summary of the model
#Specify the training components
epochs = 50
steps_per_epoch = ceil(50000/128)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)
#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=128)"cifar10model.h5")
running result
Using TensorFlow backend.
Downloading data from
170500096/170498071 [==============================] - 42s 0us/step
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 32, 32, 3) 0
conv2d_1 (Conv2D) (None, 32, 32, 32) 896 input_1[0][0]
batch_normalization_1 (BatchNor (None, 32, 32, 32) 128 conv2d_1[0][0]
activation_1 (Activation) (None, 32, 32, 32) 0 batch_normalization_1[0][0]
conv2d_2 (Conv2D) (None, 32, 32, 32) 9248 activation_1[0][0]
batch_normalization_2 (BatchNor (None, 32, 32, 32) 128 conv2d_2[0][0]
activation_2 (Activation) (None, 32, 32, 32) 0 batch_normalization_2[0][0]
conv2d_3 (Conv2D) (None, 32, 32, 32) 9248 activation_2[0][0]
add_1 (Add) (None, 32, 32, 32) 0 conv2d_1[0][0]
batch_normalization_3 (BatchNor (None, 32, 32, 32) 128 add_1[0][0]
activation_3 (Activation) (None, 32, 32, 32) 0 batch_normalization_3[0][0]
conv2d_4 (Conv2D) (None, 32, 32, 32) 9248 activation_3[0][0]
batch_normalization_4 (BatchNor (None, 32, 32, 32) 128 conv2d_4[0][0]
activation_4 (Activation) (None, 32, 32, 32) 0 batch_normalization_4[0][0]
conv2d_5 (Conv2D) (None, 32, 32, 32) 9248 activation_4[0][0]
add_2 (Add) (None, 32, 32, 32) 0 add_1[0][0]
batch_normalization_5 (BatchNor (None, 32, 32, 32) 128 add_2[0][0]
activation_5 (Activation) (None, 32, 32, 32) 0 batch_normalization_5[0][0]
conv2d_6 (Conv2D) (None, 32, 32, 32) 9248 activation_5[0][0]
batch_normalization_6 (BatchNor (None, 32, 32, 32) 128 conv2d_6[0][0]
activation_6 (Activation) (None, 32, 32, 32) 0 batch_normalization_6[0][0]
conv2d_7 (Conv2D) (None, 32, 32, 32) 9248 activation_6[0][0]
add_3 (Add) (None, 32, 32, 32) 0 add_2[0][0]
max_pooling2d_1 (MaxPooling2D) (None, 16, 16, 32) 0 add_3[0][0]
batch_normalization_7 (BatchNor (None, 16, 16, 32) 128 max_pooling2d_1[0][0]
activation_7 (Activation) (None, 16, 16, 32) 0 batch_normalization_7[0][0]
conv2d_9 (Conv2D) (None, 16, 16, 64) 18496 activation_7[0][0]
batch_normalization_8 (BatchNor (None, 16, 16, 64) 256 conv2d_9[0][0]
activation_8 (Activation) (None, 16, 16, 64) 0 batch_normalization_8[0][0]
conv2d_8 (Conv2D) (None, 16, 16, 64) 2112 add_3[0][0]
conv2d_10 (Conv2D) (None, 16, 16, 64) 36928 activation_8[0][0]
add_4 (Add) (None, 16, 16, 64) 0 conv2d_8[0][0]
batch_normalization_9 (BatchNor (None, 16, 16, 64) 256 add_4[0][0]
activation_9 (Activation) (None, 16, 16, 64) 0 batch_normalization_9[0][0]
conv2d_11 (Conv2D) (None, 16, 16, 64) 36928 activation_9[0][0]
batch_normalization_10 (BatchNo (None, 16, 16, 64) 256 conv2d_11[0][0]
activation_10 (Activation) (None, 16, 16, 64) 0 batch_normalization_10[0][0]
conv2d_12 (Conv2D) (None, 16, 16, 64) 36928 activation_10[0][0]
add_5 (Add) (None, 16, 16, 64) 0 add_4[0][0]
batch_normalization_11 (BatchNo (None, 16, 16, 64) 256 add_5[0][0]
activation_11 (Activation) (None, 16, 16, 64) 0 batch_normalization_11[0][0]
conv2d_13 (Conv2D) (None, 16, 16, 64) 36928 activation_11[0][0]
batch_normalization_12 (BatchNo (None, 16, 16, 64) 256 conv2d_13[0][0]
activation_12 (Activation) (None, 16, 16, 64) 0 batch_normalization_12[0][0]
conv2d_14 (Conv2D) (None, 16, 16, 64) 36928 activation_12[0][0]
add_6 (Add) (None, 16, 16, 64) 0 add_5[0][0]
max_pooling2d_2 (MaxPooling2D) (None, 8, 8, 64) 0 add_6[0][0]
batch_normalization_13 (BatchNo (None, 8, 8, 64) 256 max_pooling2d_2[0][0]
activation_13 (Activation) (None, 8, 8, 64) 0 batch_normalization_13[0][0]
conv2d_16 (Conv2D) (None, 8, 8, 128) 73856 activation_13[0][0]
batch_normalization_14 (BatchNo (None, 8, 8, 128) 512 conv2d_16[0][0]
activation_14 (Activation) (None, 8, 8, 128) 0 batch_normalization_14[0][0]
conv2d_15 (Conv2D) (None, 8, 8, 128) 8320 add_6[0][0]
conv2d_17 (Conv2D) (None, 8, 8, 128) 147584 activation_14[0][0]
add_7 (Add) (None, 8, 8, 128) 0 conv2d_15[0][0]
batch_normalization_15 (BatchNo (None, 8, 8, 128) 512 add_7[0][0]
activation_15 (Activation) (None, 8, 8, 128) 0 batch_normalization_15[0][0]
conv2d_18 (Conv2D) (None, 8, 8, 128) 147584 activation_15[0][0]
batch_normalization_16 (BatchNo (None, 8, 8, 128) 512 conv2d_18[0][0]
activation_16 (Activation) (None, 8, 8, 128) 0 batch_normalization_16[0][0]
conv2d_19 (Conv2D) (None, 8, 8, 128) 147584 activation_16[0][0]
add_8 (Add) (None, 8, 8, 128) 0 add_7[0][0]
batch_normalization_17 (BatchNo (None, 8, 8, 128) 512 add_8[0][0]
activation_17 (Activation) (None, 8, 8, 128) 0 batch_normalization_17[0][0]
conv2d_20 (Conv2D) (None, 8, 8, 128) 147584 activation_17[0][0]
batch_normalization_18 (BatchNo (None, 8, 8, 128) 512 conv2d_20[0][0]
activation_18 (Activation) (None, 8, 8, 128) 0 batch_normalization_18[0][0]
conv2d_21 (Conv2D) (None, 8, 8, 128) 147584 activation_18[0][0]
add_9 (Add) (None, 8, 8, 128) 0 add_8[0][0]
max_pooling2d_3 (MaxPooling2D) (None, 4, 4, 128) 0 add_9[0][0]
batch_normalization_19 (BatchNo (None, 4, 4, 128) 512 max_pooling2d_3[0][0]
activation_19 (Activation) (None, 4, 4, 128) 0 batch_normalization_19[0][0]
conv2d_23 (Conv2D) (None, 4, 4, 256) 295168 activation_19[0][0]
batch_normalization_20 (BatchNo (None, 4, 4, 256) 1024 conv2d_23[0][0]
activation_20 (Activation) (None, 4, 4, 256) 0 batch_normalization_20[0][0]
conv2d_22 (Conv2D) (None, 4, 4, 256) 33024 add_9[0][0]
conv2d_24 (Conv2D) (None, 4, 4, 256) 590080 activation_20[0][0]
add_10 (Add) (None, 4, 4, 256) 0 conv2d_22[0][0]
batch_normalization_21 (BatchNo (None, 4, 4, 256) 1024 add_10[0][0]
activation_21 (Activation) (None, 4, 4, 256) 0 batch_normalization_21[0][0]
conv2d_25 (Conv2D) (None, 4, 4, 256) 590080 activation_21[0][0]
batch_normalization_22 (BatchNo (None, 4, 4, 256) 1024 conv2d_25[0][0]
activation_22 (Activation) (None, 4, 4, 256) 0 batch_normalization_22[0][0]
conv2d_26 (Conv2D) (None, 4, 4, 256) 590080 activation_22[0][0]
add_11 (Add) (None, 4, 4, 256) 0 add_10[0][0]
batch_normalization_23 (BatchNo (None, 4, 4, 256) 1024 add_11[0][0]
activation_23 (Activation) (None, 4, 4, 256) 0 batch_normalization_23[0][0]
conv2d_27 (Conv2D) (None, 4, 4, 256) 590080 activation_23[0][0]
batch_normalization_24 (BatchNo (None, 4, 4, 256) 1024 conv2d_27[0][0]
activation_24 (Activation) (None, 4, 4, 256) 0 batch_normalization_24[0][0]
conv2d_28 (Conv2D) (None, 4, 4, 256) 590080 activation_24[0][0]
add_12 (Add) (None, 4, 4, 256) 0 add_11[0][0]
batch_normalization_25 (BatchNo (None, 4, 4, 256) 1024 add_12[0][0]
activation_25 (Activation) (None, 4, 4, 256) 0 batch_normalization_25[0][0]
dropout_1 (Dropout) (None, 4, 4, 256) 0 activation_25[0][0]
average_pooling2d_1 (AveragePoo (None, 1, 1, 256) 0 dropout_1[0][0]
flatten_1 (Flatten) (None, 256) 0 average_pooling2d_1[0][0]
dense_1 (Dense) (None, 10) 2570 flatten_1[0][0]
Total params: 4,374,538
Trainable params: 4,368,714
Non-trainable params: 5,824
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; margin: 0px; line-height: inherit; word-break: break-all; overflow-wrap: break-word; color: black; background-color: transparent; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">Epoch 1/50
391/391 [==============================] - 27s 68ms/step - loss: 1.2885 - acc: 0.5326 - val_loss: 1.6630 - val_acc: 0.4961
Epoch 2/50
391/391 [==============================] - 21s 53ms/step - loss: 0.8541 - acc: 0.7001 - val_loss: 1.0465 - val_acc: 0.6674
Epoch 3/50
391/391 [==============================] - 21s 54ms/step - loss: 0.6907 - acc: 0.7593 - val_loss: 0.9077 - val_acc: 0.7053
Epoch 4/50
391/391 [==============================] - 22s 56ms/step - loss: 0.6064 - acc: 0.7902 - val_loss: 0.6870 - val_acc: 0.7732
Epoch 5/50
391/391 [==============================] - 21s 53ms/step - loss: 0.5409 - acc: 0.8119 - val_loss: 0.6286 - val_acc: 0.7820
Epoch 6/50
391/391 [==============================] - 20s 52ms/step - loss: 0.4976 - acc: 0.8276 - val_loss: 0.6467 - val_acc: 0.7915
Epoch 7/50
391/391 [==============================] - 21s 53ms/step - loss: 0.4554 - acc: 0.8428 - val_loss: 0.7318 - val_acc: 0.7812
Epoch 8/50
391/391 [==============================] - 21s 54ms/step - loss: 0.4276 - acc: 0.8515 - val_loss: 0.5955 - val_acc: 0.8024
Epoch 9/50
391/391 [==============================] - 20s 51ms/step - loss: 0.4037 - acc: 0.8592 - val_loss: 0.7164 - val_acc: 0.7742
Epoch 10/50
391/391 [==============================] - 20s 52ms/step - loss: 0.3785 - acc: 0.8691 - val_loss: 0.5306 - val_acc: 0.8272
Epoch 11/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3606 - acc: 0.8747 - val_loss: 0.6534 - val_acc: 0.8090
Epoch 12/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3378 - acc: 0.8816 - val_loss: 0.4706 - val_acc: 0.8475
Epoch 13/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3182 - acc: 0.8888 - val_loss: 0.4721 - val_acc: 0.8438
Epoch 14/50
391/391 [==============================] - 21s 54ms/step - loss: 0.3070 - acc: 0.8941 - val_loss: 0.5304 - val_acc: 0.8327
Epoch 15/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2959 - acc: 0.8972 - val_loss: 0.5714 - val_acc: 0.8310
Epoch 16/50
391/391 [==============================] - 22s 56ms/step - loss: 0.2757 - acc: 0.9032 - val_loss: 0.5431 - val_acc: 0.8413
Epoch 17/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2722 - acc: 0.9045 - val_loss: 0.5690 - val_acc: 0.8257
Epoch 18/50
391/391 [==============================] - 21s 54ms/step - loss: 0.2542 - acc: 0.9105 - val_loss: 0.5157 - val_acc: 0.8502
Epoch 19/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2447 - acc: 0.9150 - val_loss: 0.4588 - val_acc: 0.8625
Epoch 20/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2299 - acc: 0.9180 - val_loss: 0.5702 - val_acc: 0.8410
Epoch 21/50
391/391 [==============================] - 20s 51ms/step - loss: 0.2238 - acc: 0.9207 - val_loss: 0.5116 - val_acc: 0.8418
Epoch 22/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2201 - acc: 0.9242 - val_loss: 0.4404 - val_acc: 0.8655
Epoch 23/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2071 - acc: 0.9270 - val_loss: 0.3913 - val_acc: 0.8784
Epoch 24/50
391/391 [==============================] - 21s 55ms/step - loss: 0.2007 - acc: 0.9300 - val_loss: 0.4831 - val_acc: 0.8581
Epoch 25/50
391/391 [==============================] - 20s 52ms/step - loss: 0.1993 - acc: 0.9298 - val_loss: 0.4367 - val_acc: 0.8684
Epoch 26/50
391/391 [==============================] - 24s 61ms/step - loss: 0.1902 - acc: 0.9327 - val_loss: 0.3972 - val_acc: 0.8818
Epoch 27/50
391/391 [==============================] - 25s 64ms/step - loss: 0.1804 - acc: 0.9355 - val_loss: 0.4377 - val_acc: 0.8714
Epoch 28/50
391/391 [==============================] - 24s 62ms/step - loss: 0.1751 - acc: 0.9396 - val_loss: 0.4713 - val_acc: 0.8644
Epoch 29/50
391/391 [==============================] - 23s 60ms/step - loss: 0.1686 - acc: 0.9399 - val_loss: 0.4441 - val_acc: 0.8689
Epoch 30/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1619 - acc: 0.9436 - val_loss: 0.5143 - val_acc: 0.8729
Epoch 31/50
391/391 [==============================] - 21s 55ms/step - loss: 0.1562 - acc: 0.9439 - val_loss: 0.4043 - val_acc: 0.8834
Epoch 32/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1512 - acc: 0.9463 - val_loss: 0.3830 - val_acc: 0.8895
Epoch 33/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1456 - acc: 0.9482 - val_loss: 0.3707 - val_acc: 0.8900
Epoch 34/50
391/391 [==============================] - 23s 58ms/step - loss: 0.1415 - acc: 0.9498 - val_loss: 0.4362 - val_acc: 0.8788
Epoch 35/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1423 - acc: 0.9501 - val_loss: 0.4081 - val_acc: 0.8881
Epoch 36/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1350 - acc: 0.9523 - val_loss: 0.4355 - val_acc: 0.8809
Epoch 37/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1343 - acc: 0.9526 - val_loss: 0.4465 - val_acc: 0.8825
Epoch 38/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1314 - acc: 0.9526 - val_loss: 0.3857 - val_acc: 0.8941
Epoch 39/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1207 - acc: 0.9574 - val_loss: 0.5319 - val_acc: 0.8636
Epoch 40/50
391/391 [==============================] - 21s 55ms/step - loss: 0.1206 - acc: 0.9569 - val_loss: 0.4038 - val_acc: 0.8907
Epoch 41/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1191 - acc: 0.9578 - val_loss: 0.3672 - val_acc: 0.8963
Epoch 42/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1148 - acc: 0.9596 - val_loss: 0.4449 - val_acc: 0.8819
Epoch 43/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1116 - acc: 0.9591 - val_loss: 0.4252 - val_acc: 0.8844
Epoch 44/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1097 - acc: 0.9612 - val_loss: 0.5019 - val_acc: 0.8774
Epoch 45/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1066 - acc: 0.9619 - val_loss: 0.4458 - val_acc: 0.8822
Epoch 46/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1032 - acc: 0.9634 - val_loss: 0.4647 - val_acc: 0.8833
Epoch 47/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1027 - acc: 0.9634 - val_loss: 0.4329 - val_acc: 0.8845
Epoch 48/50
391/391 [==============================] - 22s 56ms/step - loss: 0.0990 - acc: 0.9644 - val_loss: 0.4254 - val_acc: 0.8880
Epoch 49/50
391/391 [==============================] - 22s 57ms/step - loss: 0.0935 - acc: 0.9676 - val_loss: 0.4516 - val_acc: 0.8850
Epoch 50/50
391/391 [==============================] - 22s 55ms/step - loss: 0.0969 - acc: 0.9660 - val_loss: 0.3984 - val_acc: 0.8995
10000/10000 [==============================] - 1s 143us/step</pre>