AI

数据发掘与机器学习算法导论-1.最优化(Optimization

2019-07-29  本文已影响0人  sarai_c7eb

1.1 算法(algorithms)

算法就是为了计算而产生的可迭代的,一步步执行的程序;
这个程序可以是一段简洁的描述,一个公式,或者公式加描述;
常用的算法如:寻找多项式的解,判断一个数字是否为素数,或者产生随机数字;这些都是算法。

An algorithm is an iterative, step-by-step procedure for computation.

1.1.1 算法的本质

一个算法的本质可以写作一个迭代公式或者多个迭代公式的组合;

In essence, an algorithm can be written as an iterative equation or a set of iterative equations;

如:寻找a的平方根,迭代公式如下:
x_{k+1}=\frac{1}{2}(x_k+\frac{a}{x_k}) \qquad k\in(0,1,2,...)
k为迭代计数器,一般从0开始;
a=100,从x_0=1开始迭代,R代码如下:

cnt <- 8 
a <- 100
i <- 0
x <- 1
while(i<cnt) {
    x <- (1/2)*(x+a/x)
    i <- i+1
    cat("the iteration conter is: ",i," and the result is: ",x,"\n")
}
迭代结果

第6轮时已经接近了真实解,第7轮和第8轮起始已经得到了真实的结果。

泰勒公式
如果f(x)x_0n阶可导,f(x)可利用(x-x_0)n次多项式来逼近。
f(x)=\frac{f(x_0)}{0!}+\frac{f'(x_0)}{1!}(x-x_0)+\frac{f''(x_0)}{2!}(x-x_0)^2+...+\frac{f^{(n)}(x_n)}{n!}(x-x_0)^n+R_n(x)
牛顿迭代算法
取泰勒公式前两项f(x)=\frac{f(x_0)}{0!}+\frac{f'(x_0)}{1!}(x-x_0)=0
得到:x=x_0-\frac{f(x_0)}{f'(x_0)}
迭代公式:x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k )}

1.1.2 算法存在的问题

1.1.3 算法的类型

一种算法只能完成一种特定的计算任务,没有一种算法可以完成所有的任务:
一般有:

1.2 最优化(optimization)

1.2.1 A simple example

已知一个体积确定的圆柱桶,怎么设计让其表面积最小:

根据面积和体积公式,最终计算得出当h=2r时候最优。
如果选择为球形,则得到更小的面积;

1.2.2 General formulation of optimization

maximize/minimize f(x), x=(x_1,x_2,...,x_D)^T\in R^D
subject to:
\phi_j(x)=0 (j=1,2,...,M)
\psi_k(x)\leq 0 (k=1,...,N)

根据微积分知识,可以根据一次微分和二次微分判断极大值和极小值;

一次微分 二次微分
极大值 f'(x)=0 f''(x)<0
极小值 f'(x)=0 f''(x)>0

1.2.3 feasible solution(可行解)

A point x that satisfies all the constraints is called a feasible point and thus is a feasible solution to the problem.
the set of all feasible points is called the feasible region.

满足条件的取值范围都为可行解。

1.2.4 optimality criteria(最优性判决)

A point x_* is called a strong local maximum of the nonlinearly constrained optimization problem if f(x) is defined in a \delta-neighborhood N(x_*,\delta) and satisfies f(x_*)>f(u) for u\in N(x_*,\delta), where \delta>0 and u\neq x_ *
a weak local maximum f(x_*)\geq f(u).
a strong local minimum f(x_*)<f(u).
a weak local minimum f(x_*)\leq f(u).

1.3 无约束最优化

1.3.1 univariate functions(一元函数)

可利用1.2.2节中表格进行计算。
即利用一次微分和二次微分组合进行计算;

1.3.2 Multivariate functions(多元函数)

一次微分用梯度向量G替换
二次微分用Hessian matrix替换

G=\nabla f=(\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},...,\frac{\partial f}{\partial x_n})^T
H= \left[ \begin{matrix} \frac{\partial^2f}{\partial x_1^2} &\frac{\partial^2f}{\partial x_1\partial x_2} & \cdots & \frac{\partial^2f}{\partial x_1\partial x_n} \\ \frac{\partial^2f}{\partial x_2\partial x_1} &\frac{\partial^2f}{\partial x_2^2} & \cdots & \frac{\partial^2f}{\partial x_2\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2f}{\partial x_n\partial x_1} & \frac{\partial^2f}{\partial x_n\partial x_2} & \cdots & \frac{\partial^2f}{\partial x_n^2} \\ \end{matrix} \right]
G=0,当H行列式为正值时得到极小值。
G=0,当H行列式为负值时得到极大值。

1.4 非线性约束最优化

思路:将约束问题变为非约束问题

1.4.1 Penalty method

minimize f(x),x=(x_1,...,x_n)^T\in R^n
subject to:
\phi_i(x)=0,(i=1,...,M)
\psi_j(x)\leq0,(j=1,...,N)

定义penalty function从而将约束问题变成非约束问题:
\Pi(x,u_i,v_j)=f(x)+\sum_{i=1}^{M}u_i\phi^2_i(x)+\sum_{j=1}^{N}v_jmax\left\{0,\psi_j(x)\right\} ^2
u_i\geq1,v_j\geq0
然后根据1.3节求得最优解;

1.4.2 Lagrange multipliers

minimize f(x),x=(x_1,...,x_n)^T\in R^n
subject to:
h(x)=0
combine the f(x) and h(x) called Lagrangian
\Pi=f(x)+\lambda h(x)
\lambda is the Lagrange multiplier, which is an unknown scalar to be determined.

如果h_j(x)=0,(j=1,...,M),则需要M个Lagrange multipliers,
\Pi(x,\lambda_j)=f(x)+\sum_{j=1}^{M}\lambda_jh_j(x)
则偏微分方程如下:
\frac{\partial\Pi}{\partial x_i}=\frac{\partial{f}}{\partial{x_i}}+\sum_{j=1}^M\lambda_j\frac{\partial{h_j}}{\partial{x_i}}=0(i=1,...,n)
\frac{\partial\Pi}{\partial \lambda_j}=h_j=0(j=1,...,M)

1.4.3 Karush-Kuhn-Tucker conditions

minimize f(x),x=(x_1,...,x_n)^T\in R^n
subject to:
\phi_i(x)=0,(i=1,...,M)
\psi_j(x)\leq0,(j=1,...,N)

If all the functions are continuously differentiable at a local minimum x_*, then there exist constants \lambda_0,\lambda_1,...,\lambda_q and u_1,...,u_p such that:
\lambda_0\nabla f(x_*)+\sum_{j=1}^{M}u_j\nabla\phi_i(x_*)+\sum_{j=1}^{N}\lambda_j\nabla\psi_j(x_*)=0
\psi_j(x_*)\leq0,\lambda_j\psi_j(x_*)=0,(j=1,2,...,N)
where\lambda_j\geq0,(i=0,1,...,N),\sum_{j=0}^N\lambda_j+\sum_{i=1}^{M} {\left|u_i\right|}\geq0.

上一篇下一篇

猜你喜欢

热点阅读