Andrew Ng ML(1)——basic knowledge

2018-12-19 本文已影响0人 tmax

introduction

supervised learning(with labels)
regressing
classification
unsupervised learning(no labels or same label)
clustering

univariate (one variable) linear regressing (supervised learning)

m: numbers of training examples
x's: input variable/features
y's: output variable/targets variable
e.g.
$(x,y)$ :single training example
$(x^{(i)},y^{(i)})$ : $i^{th}$ training example
regressing

Hypothesis: $h_{\Theta}(x)=\Theta_0 +\Theta_1x$
Parameters: $\Theta_{i's}$
cost function: $J(\Theta_0,\Theta_1)=\frac {1} {2m}\sum_1^m (h_{\Theta}(x^{(i)})-y^{(i)})^2$ (←this is a square error function，also the most commonly used one for regression problems)
goal: $\displaystyle minimize_{\Theta_0,\Theta_1} \ J(\Theta_0,\Theta_1)$

simplify hypothesis as $h_{\Theta}(x)=\Theta_1x$
$\Downarrow$

each value of Theta1 corresponds a different hypothesis

hypothesis as $h_{\Theta}(x)=\Theta_0+\Theta_1x$
$\Downarrow$

cost function(function J) when the hypothesis have two parameters

Right:contour plot(等高线图) of cost function

"Batch"Gradient descent("Batch"梯度下降) with one variable

Batch:每一步梯度下降均用到了整个样本（ $J(\Theta_0 ,\Theta_1)$ 中有对均方误差的累加）
have some functions $J(\Theta_0 ,\Theta_1.... \Theta_n)$
want min $J(\Theta_0 ,\Theta_1.... \Theta_n)$
outline:1.start with some $\Theta_0 ,\Theta_1.... \Theta_n$ (commonly they are all zeros) 2.keep changing $\Theta_0 ,\Theta_1.... \Theta_n$ to reduce $J(\Theta_0 ,\Theta_1.... \Theta_n)$ until we hopefully end up at a mininum

Gradient descent algorithm (P.S. := 表示赋值， = 表示比较，需要同时更新两个parameters)
simplify hypothesis as

梯度下降公式中，导数项的含义

α的取值对梯度下降的影响（如果Θ已经取到局部最小值，由于导数项为0，解将一直保持在局部最小值）

simplify hypothesis as $h_{\Theta}(x)=\Theta_0+\Theta_1x$
$\Downarrow$

cost function and Gradient descent algorithm when hypothesis have two parameters

导数项计算

将导数项代会上图中的梯度下降算法

最后，将梯度下降算法中得到的parameters $\Theta_0,\Theta_1$ 代入 $h_{\Theta}(x)$ ，就能得到最优解线性拟合函数

Matrices and vectors（回顾）

Vector: An n x 1 matrix (in this course)
e.g. $y=\begin{bmatrix} 460\\ 232\\ 315\\ 178\\ \end{bmatrix}$ $y_i = i^{th}$ element,( $y_1=460$ )
matrices addition (略)
scalar multiplication
$3\begin{bmatrix} 1&0\\ 2&5\\ 3&1\\ \end{bmatrix}=\begin{bmatrix} 3&0\\ 6&15\\ 9&3\\ \end{bmatrix}$ $\begin{bmatrix} 4&0\\ 6&3\\ \end{bmatrix}/4=\begin{bmatrix} 1&0\\ 3/2&3/4\\ \end{bmatrix}$
matrices multiplication

calculate all of predicted prices at the same time（单个假设函数）
$\Downarrow$
Houses sizes:
2104
1416
1534
852

hypothesis:
$h_\Theta(x)=-40+0.25x$

$\begin{bmatrix} 1&2104\\ 1&1416\\ 1&1534\\ 1&852\\ \end{bmatrix}* \begin{bmatrix} -40\\ 0.25 \end{bmatrix}= \begin{bmatrix} -40*1+2104*0.25\\ ...\\ ...\\ -40*1+852*0.25 \end{bmatrix}$
(prediction = DataMatrix * parameters)

多个假设函数
$\Downarrow$

多假设函数

properties of matrices multiplication
$A\times B \not= B\times A$ in general, expect $A \times I = I \times A$
$A \times B \times C = A \times (B \times C)$
matrices inverse (逆矩阵)

if A is an m x m matrix, and if it has an inverse
$A（A^{-1}）= A^{-1}A=I$
如果一个矩阵没有逆矩阵，贼该矩阵为奇异矩阵（singular）、退化矩阵（degenerate）
如何手工求解逆矩阵？
$A=\begin{bmatrix} a11 & a12\\ a12 & a21 \end{bmatrix}$ ， $A^{-1}=\frac{1}{det(B)}A^*$
行列式: $det(B)=a11a22-a12a21$
伴随矩阵： $A^*= \begin{bmatrix} A^*_{11}& A^*_{12}\\ A^*_{21}& A^*_{22}\\ \end{bmatrix} =\begin{bmatrix} (-1)^{1+1} \times a21 & (-1)^{2+1} \times a12\\ (-1)^{2+1} \times a12 & (-1)^{2+2} \times a11\\ \end{bmatrix}$

代入求解逆矩阵，但是一般用库求解
matrix transpose(转置矩阵) 略