人工智能00034 深度学习与图像识别书评34 神经网络基础16
之前我们是使用数值微分求梯度的方式来实现MNIST识别,我们已经了解了数值微分的实现方式虽然比较简单,但是在计算上要耗费较多的时间。
误差反向传播法可以快速高效地进行梯度计算。下面我们就来熟悉下如何使用误差反向传播法来重新实现MNIST识别。
前文中,我们已经多次给出了读取数据的代码,在这里就不再赘述了。对于神经网络的代码,这里只需稍加整理(去掉了数值微分的实现逻辑)即可,具体的实现代码如下:
from collections import OrderedDict
class TwoLayerNet:
def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):
#初始化权重
self.params = {}
self.params['W1'] =weight_init_std * np.random.randn(input_size, hidden_size)
self.params['b1'] = np.zeros(hidden_size)
self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)
self.params['b2'] = np.zeros(output_size)
#生成层
self.layers = OrderedDict()
self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1'])
self.layers['Relu1'] = Relu()
self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2'])
self.layers['Relu2'] = Relu()
self.lastLayer = SoftmaxWithLoss()
def predict(self, x):
for layer in self.layers.values():
x = layer.forward(x)
return x
# x:输入数据, y:监督数据
def loss(self, x, y):
p = self.predict(x)
return self.lastLayer.forward(p, y)
def accuracy(self, x, y):
p = self.predict(x)
p = np.argmax(p, axis=1)
y = np.argmax(y, axis=1)
accuracy = np.sum(y == p) / float(x.shape[0])
return accuracy
def gradient(self, x, y):
# forward
self.loss(x, y)
# backward
dout = 1
dout = self.lastLayer.backward(dout)
layers = list(self.layers.values())
layers.reverse()
for layer in layers:
dout = layer.backward(dout)
#设定
grads = {}
grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db
return grads
最后,我们训练下这个神经网络,并且观察下训练好的权重与偏置量在测试集上的准确率。
实现代码具体如下:
train_size = x_train.shape[0]
iters_num = 600
learning_rate = 0.001
epoch = 5
batch_size = 100
network = TwoLayerNet(input_size = 784,hidden_size=50,output_size=10)
for i in range(epoch):
print('current epoch is :', i)
for num in range(iters_num):
batch_mask = np.random.choice(train_size,batch_size)
x_batch = x_train[batch_mask]
y_batch = y_train[batch_mask]
grad = network.gradient(x_batch,y_batch)
for key in ('W1','b1','W2','b2'):
network.params[key] -= learning_rate*grad[key]
loss = network.loss(x_batch,y_batch)
if num % 100 == 0:
print(loss)
print('准确率: ',network.accuracy(x_test,y_test) * 100,'%') 得到的结果具体如下(在没有进行任何额外优化的情况下,约96%的准确率还是相当不错的): current epoch is : 0
2.2753798478814895
0.6610914122397926
0.3003014145366447
0.25776192989088054
0.17468173033680465
0.12297262305993698
current epoch is : 1
0.14476994572636273
0.16806233003386506
0.10899282838635063
0.1398080642943528
0.0631957790462195
0.14957822424574135
current epoch is : 2
0.1290895688384963
0.09535212679963873
0.18500797494490775
0.057708589923198696
0.05688971712292652
0.0868967341522295
current epoch is : 3
0.06375133753928874
0.11429593125907099
0.11290842006721384
0.04896661977912546
0.20236172555026669
0.06978181342959813
current epoch is : 4
0.05107801847346741
0.07954869456879843
0.04250498953199182
0.06376040515564727
0.025734163371306584
0.035472113296809826
准确率: 96.49 %