神经网络实践之梯度检验
2019-10-20 本文已影响0人
此间不留白
前言
在机器学习的应用层面中,学习了神经网络中梯度检验的相关知识,本篇文章,将会用python
实现梯度检验并将其应用至神经网络模型中。
梯度检验
导数的定义公式如下所示:
其中代表神经网络的损失函数,而表示网络得权重参数,梯度检验就是判断通过公式(1)计算的梯度是否小于特定值,从而判断神经网络的bug的位置。
1维梯度检验
对于一维模型,损失函数的计算可以用,表示输入数据,表示网络参数,是一个实数值。一维模型的梯度检验过程可以用下图表示:
根据以上公式,梯度检验可以分为两个步骤,通过前向传播计算,通过反向传播计算梯度,具体实现代码如下所示:
- 前向传播实现
def forward_propagation(x, theta):
J = None
return J
- 反向传播实现
反向传播,需要对求导,对于一维模型而言,反向传播的求导公式是:
反向传播的代码实现如下:
def backward_propagation(x, theta):
dtheta = x
return dtheta
梯度检验的过程可以分为以下3个步骤:
- 利用公式(1)计算
gradapprox
- 利用公式(3)计算
gradapprox
与反向传播求得的梯度相比较
- 如果两者之间的差值小于,则说明神经网络没有bug.
实现代码如下所示:
def gradient_check(x, theta, epsilon = 1e-7):
thetaplus = theta+1e-7
thetaminus = theta-1e-7
J_plus = forward_propagation(x,thetaplus)
J_minus = forward_propagation(x,thetaminus)
gradapprox = (J_plus+J_minus)/2*epsilon
grad = backward_propagation(x,theta)
numerator = np.linalg.norm(grad-gradapprox)
denominator = np.linalg.norm(grad)+np.linalg.norm(gradapprox)
difference = np.abs(numerator-denominator)
if difference < 1e-7:
print ("The gradient is correct!")
else:
print ("The gradient is wrong!")
return difference
多维梯度检查
多维梯度检查的总体实现可以如下图所示:
2.PNG
根据上图所示,实现前向传播函数的代码如下所示“
def forward_propagation_n(X, Y, parameters):
Implements the forward propagation (and computes the cost) presented in Figure 3.
参数:
X -- 训练集中的m个样本
Y -- m个样本的样本输出
参数字典:
W1 -- (5, 4)的权重矩阵
b1 -- (5, 1)的偏置矩阵
W2 -- (3, 5)的权重矩阵
b2 -- (3, 1)的偏置矩阵
W3 -- (1, 3)的权重矩阵
b3 -- (1, 1)的偏置矩阵
m = X.shape[1]
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
W3 = parameters["W3"]
b3 = parameters["b3"]
# LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
Z1 = np.dot(W1, X) + b1
A1 = relu(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = relu(Z2)
Z3 = np.dot(W3, A2) + b3
A3 = sigmoid(Z3)
# Cost
logprobs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)
cost = 1./m * np.sum(logprobs)
cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
return cost, cache
反向传播的实现如下所示:
def backward_propagation_n(X, Y, cache):
m = X.shape[1]
(Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
dZ3 = A3 - Y
dW3 = 1./m * np.dot(dZ3, A2.T)
db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
dA2 = np.dot(W3.T, dZ3)
dZ2 = np.multiply(dA2, np.int64(A2 > 0))
dW2 = 1./m * np.dot(dZ2, A1.T) * 2
db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
dA1 = np.dot(W2.T, dZ2)
dZ1 = np.multiply(dA1, np.int64(A1 > 0))
dW1 = 1./m * np.dot(dZ1, X.T)
db1 = 4./m * np.sum(dZ1, axis=1, keepdims = True)
gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,
"dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,
"dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}
return gradients
与一维模型相比,此时的参数不再是一个实数,而是一个向量,这个参数向量是神经网络的所有参数组成的一维向量,转换形式如下所示:
将参数字典转化成向量的实现代码如下所示:
def dictionary_to_vector(parameters):
"""
Roll all our parameters dictionary into a single vector satisfying our specific required shape.
"""
keys = []
count = 0
for key in ["W1", "b1", "W2", "b2", "W3", "b3"]:
new_vector = np.reshape(parameters[key], (-1,1))
keys = keys + [key]*new_vector.shape[0]
if count == 0:
theta = new_vector
else:
theta = np.concatenate((theta, new_vector), axis=0)
count = count + 1
return theta, keys
def gradients_to_vector(gradients):
"""
Roll all our gradients dictionary into a single vector satisfying our specific required shape.
"""
count = 0
for key in ["dW1", "db1", "dW2", "db2", "dW3", "db3"]:
# flatten parameter
new_vector = np.reshape(gradients[key], (-1,1))
if count == 0:
theta = new_vector
else:
theta = np.concatenate((theta, new_vector), axis=0)
count = count + 1
return theta
与一维相比,多维神经网络需要对参数向量的每一个值进行梯度检验,具体实现公式如下所示:
根据以上公式,梯度检验的实现代码如下所示:
def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):
parameters_values, _ = dictionary_to_vector(parameters)
grad = gradients_to_vector(gradients)
num_parameters = parameters_values.shape[0]
J_plus = np.zeros((num_parameters, 1))
J_minus = np.zeros((num_parameters, 1))
gradapprox = np.zeros((num_parameters, 1))
for i in range(num_parameters):
thetaplus = np.copy(parameters_values)
thetaplus[i][0] = thetaplus[i]+epsilon
J_plus[i], _ = forward_propagation_n(X,Y,vector_to_dictionary(thetaplus))
thetaminus = np.copy(parameters_values)
thetaminus[i][0] = thetaminus[i]-epsilon
J_minus[i], _ = forward_propagation_n(X,Y,vector_to_dictionary(thetaminus))
gradapprox[i] = (J_plus[i]-J_minus[i])/(2*epsilon)
numerator = np.linalg.norm(grad-gradapprox)
denominator = np.linalg.norm(grad)+np.linalg.norm(gradapprox)
difference = numerator/denominator
if difference > 1e-7:
print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
else:
print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")
return difference