Andrew Ng ML(2)——linear regressi

2018-12-05 本文已影响0人 tmax

linear regressing with multiple variables(supervised learning)

m: numbers of training examples
n:numbers of features
x^(i): input (features) of $i^{th}$ example
$x_j^{(i)}$ : values of feature j in $i^{th}$ traing example
e.g.
example $x^{(2)}=\begin{bmatrix} 1416\\ 3\\ 2\\ 40\\ \end{bmatrix}$ ， $x_3^{(2)}$ =2
Hypothesis: $h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3+...+\theta_nx_n$ (for convenience，define $x_0=1$ ,means $x_0^{(i)}=1$ )
$X= \begin{bmatrix} x_0\\ x_1\\ ...\\ xn \end{bmatrix}$ ， $\theta=\begin{bmatrix} \theta_0\\ \theta_1\\ ...\\ \theta_n \end{bmatrix}$ ， $h_\theta(x)=\theta^TX$

梯度下降（多变量）

Hypothesis: $h_\theta(x)=\theta^TX=h_\theta(x)=\theta_0x_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n$
Parameters: $\theta_0,\theta_1,\theta_2...\theta_n$ (n+1维向量 $\theta$ )
Cost function: $J(\theta_0,\theta_1,\theta_2...\theta_n)=J(\theta)=\frac {1} {2m}\sum_1^m (h_{\theta}(x^{(i)})-y^{(i)})^2$

单特征值与多特征值的梯度下降公式（特征值：variable\feature\n）

Gradient descent in practice I
- Feature Scaling(特征收缩)

在两个或者多个特征值范围差距太大时，cost function的等高线图会呈现出细长的椭圆形，会导致梯度下降缓慢(可以做一定的处理，使多个特征值范围限制在同一个范围内)

对于特征值范围的选择，不一定要限制在-1~1之间，但是范围不能太大或者太小

Mean normalization(归一化处理)

$x_i \leftarrow \frac{x_i-\mu_i}{s_i}$ ， $\mu_i$ 代表第 $i$ 个特征值的平均值， $s_i$ 代表第 $i$ 个特征变量的标准差或 $max-min$

Gradient descent in practice II(about $\alpha$ )

确定梯度下降正常工作的方法：1:画出cost function的值与对应迭代次数的函数图像，观察是否收敛(通常使用的方法) 2.确定一个 $\varepsilon$ 的值，自动收敛测试

确定梯度下降正常工作的方法

所取的alpha(学习率)太大可能出现的情况

summary

if $\alpha$ too small: slow convergence
if $\alpha$ too large: cost function $J(\theta)$ may not decrease on every iteration,may not converge(slow converge also possible)
To choose $\alpha$ ,try $...,0.001,0.003,0.01,0.03,0.1,0.3,1...$

特征选择
如：在使用房屋的临街长度和深度预测房价时，可以定义一个新的特征——面积
polynomial regression (多项式回归)
根据所给出的数据集的特征，用不同的多项式模型拟合数据
e.g.：

对于上图
1、用三次模型拟合
$h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3$
$x_1=(size)$ ， $x_2=(size)^2$ ， $x_3=(size)^3$
P.S. 注意特征值缩放!
2、用平方根模型拟合
$h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2$
$x_1=(size)$ ， $x_2=\sqrt{size}$

正规方程—— $\theta$ 最优解的另一种解法（即使特征范围差距很大也不需要特征缩放）

对theta 求偏微分，即能求得最优解 e.g.:

Q: $\theta=(X^TX)^{-1}X^Ty$ 是如何求出来的？？？

$X\theta=y$ （其中 $X_{m \times(n+1)}，\theta_{(n+1)\times 1}，y_{m\times 1}$ ）
由于X并不是方阵，也就没有逆矩阵，所以首先需要两边同乘 $X^T$
即： $X^TX\theta=X^Ty$ （其中 $X^TX$ 为方阵）
易得， $\theta=(X^TX)^{-1}X^Ty$