推导线性回归
## 线性模型的推导(参考自西瓜书)
原问题:假设有$m$个样本$D=\left\{(\mathbf{x}_{1},y_{1}),(\mathbf{x}_{2},y_{2}),...,(\mathbf{x}_{m},y_{m}) \right\}$,每个样本$\mathbf{x}_{i}=\left ( x_{1},x_{2},...,x_{d} \right )$有$d$个特征,一个目标值$y_{i}=y$
### 单变量的线性回归
考虑最简单的只有**一个特征**的样本$(x_{i}, y_{i})$,线性回归试图学习:
$$
f(x_{i})=wx_{i}+b,使得f(x_{i})渐进等于y_{i}
$$
为了求得$w$和$b$,则需使用均方误差作为性能度量,并使均方误差最小化:
$$
(w, b) = \arg\limits_{(w, b)}\min\sum_{i=1}^{m}\left(wx_{i}+b-y_{i} \right)^{2}
$$
令$E(w,b)=\sum_{i=1}^{m}\left(wx_{i}+b-y_{i} \right)^{2}$分别对$w$和$b$求偏导:
$$
\begin{aligned}
\frac{\partial E(w,b)}{\partial w}&=2\cdot x_{i}\cdot \sum_{i=1}^{m}(wx_{i}+b-y_{i})
\\&=2\sum_{i=1}^{m}(wx_{i}^{2}+bx_{i}-x_{i}y_{i})
\\&=2(\sum_{i=1}^{m}wx_{i}^{2}+\sum_{i=1}^{m}x_{i}(b-y_{i})) \tag{1}
\end{aligned}
$$
对$b$:
$$
\begin{aligned}
\frac{\partial E(w,b)}{\partial b}&=2\cdot \sum_{i=1}^{m}(wx_{i}+b-y_{i})
\\&=2(\sum_{i=1}^{m}wx_{i}+\sum_{i=1}^{m}b-\sum_{i=1}^{m}y_{i})
\\&=2(\sum_{i=1}^{m}wx_{i}+mb-\sum_{i=1}^{m}y_{i})\tag{2}
\end{aligned}
$$
从以上$w$和$b$的导函数中可以看出相关变量的系数$\sum_{i=1}^{m}x_{i}^{2}$与$m$都为**正数**,故对应的导函数都为**增函数**,故当导函数值取0时对应的极值为。令$(1)$式和$(2)$式为值为0,可推导出$w$和$b$的表达式。$先令x的均值\bar{x}=\frac{1}{m}\sum_{i=1}^{m}x_{i}$,先推导$b$的表达式:
$$
令\frac{\partial E(w,b)}{\partial b}=0 \\
\Rightarrow \sum_{i=1}^{m}wx_{i}+mb-\sum_{i=1}^{m}y_{i}=0 \\
\Rightarrow b=\frac{1}{m}\sum_{i=1}^{m}y_{i}-\frac{1}{m}\sum_{i=1}^{m}wx_{i} \tag{3}
$$
再推导$w$的表达式:
$$
令\frac{\partial E(w,b)}{\partial w}=0 \\
\Rightarrow \sum_{i}^{m}wx_{i}^{2}+\sum_{i=1}^{m}x_{i}b-\sum_{i=1}^{m}x_{i}y_{i}=0
\\ 带入(3)\Rightarrow \sum_{i=1}^{m}wx_{i}^{2}+\sum_{i=1}^{m}x_{i}(\frac{1}{m}\sum_{i=1}^{m}y_{i}-\frac{1}{m}\sum_{i=1}^{m}wx_{i})-\sum_{i=1}^{m}x_{i}y_{i}=0
\\ 化简 \Rightarrow \sum_{i=1}^{m}wx_{i}^{2}+\frac{1}{m}\sum_{i=1}^{m}x_{i}\sum_{i=1}^{m}y_{i}-\frac{w}{m}(\sum_{i=1}^{m}x_{i})^{2}-\sum_{i=1}^{m}x_{i}y_{i}=0
\\ 化简合并 \Rightarrow w(\sum_{i=1}^{m}x_{i}^2-\frac{1}{m}(\sum_{i=1}^{m}x_{i})^{2})+\sum_{i=1}^{m}(\bar{x}-x_{i})y_{i}=0
\\ \Rightarrow w = \frac{\sum_{i=1}^{m}y_{i}(x_{i}-\bar{x})}{\sum_{i=1}^{m}x_{i}^2-\frac{1}{m}(\sum_{i=1}^{m}x_{i})^{2}}
$$
求得的两个表达式为:
$$
w = \frac{\sum_{i=1}^{m}y_{i}(x_{i}-\bar{x})}{\sum_{i=1}^{m}x_{i}^2-\frac{1}{m}(\sum_{i=1}^{m}x_{i})^{2}}, \qquad
b=\frac{1}{m}\sum_{i=1}^{m}y_{i}-\frac{1}{m}\sum_{i=1}^{m}wx_{i}
$$
### 多元线性回归
考虑多个特征的样本$\mathbf{x}_{i}=\left ( x_{1},x_{2},...,x_{d} \right )$,多元线性回归试图学习:
$$
f(x_{i})=w^{T}x_{i}+b,使得f(x_{i})渐进等于y_{i}
$$
为了便于向量的运算,令$\hat{w}=(w;b)=\binom{w}{b}$,把数据集$D$表示为一个$m\times(d+1)大小的矩阵:$
$$
X=\begin{Bmatrix}
&x_{11} &x_{12}&...&x_{1d} &1\\
&x_{21} &x_{22}&...&x_{2d} &1\\
&... &... &... &... &...\\
&x_{m1} &x_{m2} &... &x_{md} &1
\end{Bmatrix}=\begin{pmatrix}
\mathbf{x}_{1}^{T} &1\\
\mathbf{x}_{2}^{T} &1\\
... &...\\
\mathbf{x}_{m}^{T} &1
\end{pmatrix}
$$
再把目标值也写成向量的形式:$\mathbf{y}=\begin{pmatrix}
y_{1}\\
y_{2}\\
...\\
y_{m}
\end{pmatrix}$,则类似有:
$$
\hat{w}=\arg\min(\mathbf{y}-X\hat{w})^{T}(\mathbf{y}-X\hat{w})
$$
令$E_{\hat{w}}=(\mathbf{y}-X\hat{w})^{T}(\mathbf{y}-X\hat{w})$,对$\hat{w}$求导得:
$$
\begin{aligned}
\frac{\partial E_{\hat{w}}}{\partial \hat{w}}&=\triangledown_{\hat{w}}(\mathbf{y}^{T}-\hat{w}^{T}X^{T})(\mathbf{y}-X\hat{w})
\\ &=\triangledown_{\hat{w}}(\mathbf{y}^{T}\mathbf{y}-y^{T}X\hat{w}-\hat{w}^{T}X^{T}\mathbf{y}+\hat{w}^{T}X^{T}X\hat{w})
\\ &=0-(\mathbf{y}^{T}X)^{T}-X^{T}\mathbf{y}+(X^{T}X\hat{w}+(\hat{w}X^{T}X)^{T})
\\ &=-2X^{T}\mathbf{y}+2X^{T}X\hat{w}
\\ &=2X^{T}(X\hat{w}-\mathbf{y})
\end{aligned}
\\ 这里的矩阵求导法则用了如下两个公式: \\
\frac{\partial(A^{T}WB)}{\partial W}=AB^{T},\frac{\partial(A^{T}W^{T}B)}{\partial W}=BA^{T}
$$
令上式为零即可推导出$\hat{w}$的表达式:
$$
2X^{T}(X\hat{w}-\mathbf{y})=0
\Rightarrow X^{T}X\hat{w}=X^{T}\mathbf{y}
\\ \Rightarrow \hat{w}=(X^{T}X)^{-1}X^{T}\mathbf{y}
$$
令每个样本为$\hat{x}_{i}=(\mathbf{x}_{i},1)$,得最终的多元线性回归模型:
$$
\begin{aligned}
f(\hat{x}_{i})&=(\mathbf{x}_{i},1)\begin{pmatrix}
w\\
b
\end{pmatrix}=\hat{x}_{i}\hat{w}\\&=\hat{x}_{i}(X^{T}X)^{-1}X^{T}\mathbf{y}
\end{aligned}
$$
显然,上式能成立的基本条件就是矩阵$X^{T}X$可逆(即满秩矩阵),**当特征的数量大于样本的数量时矩阵就不可逆了**。
### 广义线性模型
$$
g(y)=w^{T}x+b\\ \Rightarrow \\
y = g^{-1}(w^{T}x+b)
$$