机器学习量化

正规方程求解特征参数的推导过程

2018-11-26  本文已影响3人  梅_梅

多变量线性回归代价函数为:
j(\theta_0,\theta_1, \theta_2 ... \theta_n) = \frac{1}{2m}\sum_{i=1}^{m}(h(x^{(i)}) - y^{i})^{2}
其中:
h(x) = \theta^TX = \theta_0x_0 + \theta_1x_1 ... +\theta_nx_n
正规方程是通过求解下面的方程来找出使得代价函数最小的参数:
\frac{\partial}{\partial \theta_j}J(\theta_j) = 0

设有m个训练实例,每个实例有n个特征,则训练实例集为:
\begin{equation} X = \left[ \begin{matrix} x_0^{(1)}&...&x_n^{(1)}&\\ ...&...&...&\\ x_0^{(m)}&...&x_n^{(m)}& \end{matrix} \right] \end{equation}

其中
x_{j}^{(i)}
表示第i个实例第j个特征。

特征参数为:
\theta = [ \theta_0,\theta_1,\theta_2 ... \theta_n ]^{T}
输出变量为:
j(\theta_0,\theta_1, \theta_2 ... \theta_n) = \frac{1}{2m}(X \times \theta - Y)^T(X \times \theta - Y) = \frac{1}{2m}(Y^TY-Y^TX\theta - \theta^TX^TY + \theta^TX^TX\theta)
进行求导,等价于如下的形式:
\frac{1}{2m}(\frac{\partial{Y^TY}}{\partial{\theta}}-\frac{\partial{Y^TX\theta}}{\partial{\theta}} - \frac{\partial{\theta^TX^TY}}{\partial{\theta}} + \frac{\partial{\theta^TX^TX\theta}}{\partial{\theta}})
其中第一项:
\frac{\partial{Y^TY}}{\partial{\theta}} = 0
其中第二项:
Y^TX\theta = [y^1 + y^2 ... + y^m] \left[ \begin{matrix} x_0^{(1)}&...&x_n^{(1)}&\\ ...&...&...&\\ x_0^{(m)}&...&x_n^{(m)}& \end{matrix} \right][ \theta_0,\theta_1,\theta_2 ... \theta_n ]^{T} = (x_0^1y^1 + ... + x_0^my^m)\theta_1 + (x_1^1y^1 + ... + x_1^my^m)\theta_0 + ... + (x_n^1y^1 + ... + x_n^my^m)\theta_n
该矩阵求导为分母布局下的标量/向量形式:
故有
\frac{\partial{Y^TX\theta}}{\partial{\theta}} = \left[ \begin{matrix} \frac{\partial{Y^TX\theta}}{\partial{\theta_0}}\\ ...\\ \frac{\partial{Y^TX\theta}}{\partial{\theta_n}} \end{matrix} \right] = \left[ \begin{matrix} x_0^1y^1 + ... + x_0^my^m\\ ...\\ x_n^1y^1 + ... + x_n^my^m \end{matrix} \right] = X^TY
第三项
\theta^TX^TY = [\theta^0 + \theta^1 ... + \theta^n] \left[ \begin{matrix} x_0^{(1)}&...&x_n^{(1)}&\\ ...&...&...&\\ x_0^{(m)}&...&x_n^{(m)}& \end{matrix} \right][ y_1,y_2 ... y_m ]^{T} = (x_0^{(1)}\theta_0 + ... + x_n^{(1)}\theta_n)y^{1} + ... +(x_0^{(m)}\theta_0 + ... + x_n^{(m)}\theta_n)y^{m}
该矩阵求导为分母布局下的标量/向量形式:
因此
\frac{\partial{\theta^TX^TY}}{\partial{\theta}} = \left[ \begin{matrix} \frac{\partial{\theta^TX^TY}}{\partial{\theta_0}}\\ ...\\ \frac{\partial{\theta^TX^TY}}{\partial{\theta_n}} \end{matrix} \right] = \left[ \begin{matrix} x_0^1y^1 + ... + x_0^my^m\\ ...\\ x_n^1y^1 + ... + x_n^my^m \end{matrix} \right] = X^TY

第四项:

\theta^TX^TX\theta = (X^TX)(\theta_0^2 +\theta_1^2 ... + \theta_n^2)
为标量,可看成一个常数。 该矩阵求导为分母布局下的标量/向量形式,因而(二次型结合矩阵求导):

\frac{\partial{\theta^TX^TX\theta}}{\partial{\theta}} = \left[ \begin{matrix} \frac{\partial{\theta^TX^TX\theta}}{\partial{\theta_0}}\\ ...\\ \frac{\partial{\theta^TX^TX\theta}}{\partial{\theta_n}} \end{matrix} \right] = 2(X^TX)\left[ \begin{matrix} \theta_0\\ ...\\ \theta_n \end{matrix} \right] = 2X^TX\theta
综上,正规方程为:
\frac{1}{2m}(-2X^TY + 2X^TX\theta) = 0
最终可得特征参数的表示:
\theta = (X^TX)^{-1}X^TY

原文链接

原文

上一篇 下一篇

猜你喜欢

热点阅读