9. Neural Networks: Learing
Neural Networks: Learing
Cost Function
L = total number of layers in network
= no. of units (not counting bias unit) in layer . K =
- Binary classification (K = 1)
- Multi-class classification (K class)
Backpropagation algorithm
= "error" of node j in layer l.
For each output unit (layer L = 4)
- ()
Backpropagation algorithm
Vectorized implementation:
Backpropagation intuition
Forward Propagation
Understand what Backpropagation does.
Implementation note: Unrolling parameters
function [jVal, gradient] = costFunction(theta)
The paramters 'theta' and 'gradient' must be a vector.However, in Neural Network, paramter 'theta' is a matrix. So we must find a way to unroll the matrix.
thetaVec = [Theta1(:);Theta2(:);Theta3(:)];
DVec = [D1(:);D2(:);D3(:)];
Theta1 = reshape(thetaVec(1:110),10,11);
Theta2 = reshape(thetaVec(111:220),10,11);
Theta3 = reshape(thetaVec(221:231),1,11);
Gradient checking
to make sure that the backpropagation and the forward propagation are correct.
- one side difference
- two side difference
Implement: gradApprox = (J(theta+EPSILON)-J(theta-EPSILON))/(2*EPSILON)
Parameter vector
Implementation Note:
- Implement backprop to compute DVec.
- Implement numerical gradient check to compute gradApprox.
- Make sure they give similar values.
- Turn off gradient checking. Using backprop code for learing.
- Be sure to disable your gradient checking code before training your classifier.
Random initialization
If use 'zero initialization', after each update, parameters corresponding to inputs going into each of two hidden units are identical.
Random initialization: Symmetry breaking
Initialize each to a random value in [-, ]
Put it together
training a neural network
Pick a network architechture (connectivity pattern between neurons)
- No. of input units: Dimension of features
- No. of output units: Nmuber of classes
- Reasonable default: 1 hidden hayer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better)
- Randomly initialize weights
- Implemnet forward propagation to get for any
- Implement code to compute function
- Implement backprop to compute partial derivatives
- Use gradient checking to compare computed using backpropagation vs. using numerical estimate of gradient of . Then disable gradient checking code.
- Use gradient descent or advanced optimization menthod with backpropagation to try to minimize as a function of parameters