我爱编程

Deep Learning | 1 Neural Network

2018-03-05  本文已影响0人  shawn233

0 Quick Reference (This section is not for starters)

This part is for the occasions when you have already learned deep learning but forget about the formulas. For starters, just skip this section.

forward propagation

backward propagation

shape check


Best thanks to Professor Andrew Ng and his learning materials for deep learning. This note is based on the videos of Andrew Ng.

If you are a deep learning starter, I strongly recommend you to watch these videos, they are easy to understand, and totally free.

The rest part of this note is a brief summary of the materials, take a look if you like.

1 Introduction

Architecture Application
Standard NN General
Convolutional NN Image
Recurrent NN One-dimensional sequence data

2 Basics of Neural Network Programming

Notation

Notation Description
(x,y) A single training example, where x is an nx-dimensional feature vector, y is either 1 or 0.
m The number of training examples, i.e. training set: {(x(1), y(1)), (x(2), y(2)), ..., (x(m), y(m))}. More precisely, it's also denoted as mtrain.
mtest The number of test examples.
X The matrix of training examples, i.e. X = [x(1), x(2), ..., x(m)].
Y The matrix of ys, i.e. Y = [y(1), y(2), ..., y(m)]

In python,

X.shape = (n_x, m)
Y.shape = (1, m)

2.1 Logistic Regression


When z is large, sigmoid(z) is close to 1, and when z is small, sigmoid(z) is close to 0.

如果定义Loss Function为平方误差,则可能会导致梯度下降法 (gradient descent)落入某个局部最优解,因而不太好用。

So in your training logistic model, we're going to try to find parameters w and b that minimize the overall cost function J(\hat{y}, y).

Cost Function Is A Convex Function

Cost Function 是一个凸函数 (convex function) ,存在最低点,因此采用Gradient Descent Algorithm (梯度下降算法) 来得到(习得)最优参数。

Gradient Descent Algorithm

Repeat updating parameter w and b using the following expressions. (:= means "update <left> as <right>")

Gradient Descent Algorithm

(可以通过只研究J(w)中参数w的优化来形象地理解这个过程)

alpha is called Learning Rate.

Gradient Descent Algorithm Implementation

(1) For one training example, the computational graph could be

So if we implement gradient descent algorithm to this logistic regression example, we could calculate the formulas in the following steps.

(2) 我们可以把单个样本的梯度下降结果应用在m个样本上。

结合前面的结果,可以很容易地用for循环实现。但是在编程中,我们不希望用嵌套的for循环,这会导致程序运行缓慢。相反,我们可以用一种叫做Vectorization (向量化) 的方法省略不必要的for循环。

Vectorization

For nx-dimensional vector w and x, we could either multiply them using non-vectorized implementation, or using vectorization.

For example, we are going to implement z = wTx + b.

# in python, using numpy
# implement z = w^T x+b

import numpy as np

#1 Non-vectorized implementation
z = 0
for i in range (n_x):
    z += (w[i] * x[i])
z += b

#2 Vectorization
z = np.dot(w, x) + b  #这种写法在底层使用了并行计算,快得多

Whenever possible, avoid explicit for-loops.

Vectorization in the implementation of the forward propagation

Vectorization in the implementation of the backward propagation

Implementaion with python

import numpy as np

#Given training data (X, Y) and learning rate alpha
#Then a single iteration could be:

Z = np.dot(w.T, X) + b #这里b是一个常数,但会在计算时自动转换为向量
A = sigma(Z)
dZ = A - Y
dw = np.dot(X, dZ.T) / m
db = np.sum(dZ) / m

w = w - alpha*dw  #update vector w shaped (n_x, 1) 
b = b - alpha*db  #update b

2.2 Tips About Coding Neural Network In Python And Numpy


3 One Hidden Layer Neural Network

Input Layer -> Hidden Layer -> Output Layer

Hidden layer means this layer could not be seen in the training set.

用符号ai[n]表示第n层的第i个神经元传递给下一层的值(输出值)。输入层为a[0],即输入层为第0层。字母a代表activation(激活)。

Neural Network Representation

Compute the output of the neural network for a single training example:

Forward Propagation For A Single Training Example

Compute the output of the neural network for m training examples (vectorization):

Forward Propagation For m Training Exmaples

上图中的大矩阵的水平方向是不同的训练样本,竖直方向是不同的神经元 (neural network units)

3.1 Different Activation Functions


Well, for this part, I strongly suggest you directly watching the video rather than reading my summary. Because this course is so fruitful that I could only summarize the conclusions. Meanwhile, if you watch the whole video, you would learn much more.

Activation Functions

Summary

Why should we choose non-linear activation functions?

Derivatives for activation functions

3.2 Formulas for one hidden layer neural network


Cost Function and Gradient Descent Forward Propagation Backward Propagation

3.3 Random Parameter Initialization

# 生成高斯分布随机变量,乘0.01是因为希望得到小的初始值,避免落入激活函数的小斜率区
W[i] = np.random.randn((n[i], n[i-1])) * 0.01

# W随机初始化后,b可以不随机初始化
b[i] = np.zeros((n[i], 1))

4 Deep Neural Networks

Why do we need more layers rather than more units?

Notation

notation / example description
L The number of layers in the neural network
L=4 This neural network has 4 layers
n[l] The number of units in the lth layer
n[1]=5 The first layer has 5 units
a[l] The activation function of the lth layer
a[l]=g[l](z[l]) a[l]=g[l](z[l])

4.1 Forward Propagation


Tips: 检查矩阵大小

4.2 Backward Propagation


Even though I work on the machine learning for a long time, sometimes it still surprises me a bit when my learning algorithms work. Because lots of complexity of your learning algorithm comes from the data rather than necessarily from your writing.

4.3 Hyper-parameters


Hyper-parameters Description
alpha Learning rate
Iterations The number iterations of your training
L The number of hidden layers
n[i] The number of hidden units in layer i
g[i] The choice of activation function for layer i
... ...
上一篇下一篇

猜你喜欢

热点阅读