logistical regression

2018-08-07 本文已影响0人习惯了千姿百态

1.logistic regression model

1.1 classification

want $0\le h_\theta(x)\le1$
$h_\theta(x)=g(\theta^Tx)$
$g(z)=\frac{1}{1+e^{-z}}$
$h_\theta(x)=P(y=1|x;\theta)--estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta$

1.2 cost function

$Cost(h_\theta(x),y)=\left\{ \begin{aligned} &\ -log(h_\theta(x))\qquad\ \ \ if \ y=1 \\ &\ -log(1-h_\theta(x))\quad if \ y=0 \end{aligned} \right.$
$J(\theta)=\frac{1}{m} \sum_{i=1}^m Cost(h_\theta(x^{(i)},y^{(i)})$

y=1

y=0

simplified cost function:
$Cost(h_\theta(x),y)=-ylog(h_\theta(x))-(1-y)log(1-h_\theta(x))$

gradient descent:
repeat {
$\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
}

1.3Optimization algorithms:

Gradient descent
Conjugate gradient
BFGS
L-BFGS
cost function

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Then we can use octave's "fminunc()" optimization algorithm along with the "optimset()" function that creates an object containing the options we want to send to "fminunc()"

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

1.4 multiclass classfication

Train a logistic regression classifier $h_\theta(x)$ for each class to predict the probability that y = i .To make a prediction on a new x, pick the class that maximizes $h_\theta (x)$

1.5 how to solve overFitting

reduce number of features
regularization

1.5.1 Regularized Linear Regression

small values for parameters $\theta_0,\theta_1...\theta_n$
①simpler hypothesis
②less prone to overfitting

$J(\theta)=\frac{1}{2m}\big[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{i=1}^n\theta_j^2\big]$
if $\lambda$ is too large,then it will result in underfitting, because $\theta_0,\theta_1...\theta_n$ will be close to 0 at this moment.

gradient descent:
repeat{
$\theta_0:=\theta_0-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}$
$\theta_j:=\theta_j-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j\quad (j=1...n)$
}

Normal Equation:

normal equation

1.5.2 Regularized logistic Regression

$J(\theta)=-\big[1/m\sum_{i=1}^m(y^{(i)}log(h_\theta(x^{(i)}))-(1-y^{(i)})log(1-h_\theta(x^{(i)})))\big]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2$
attention: $\theta_0=0$

Appendices

the derivation of cost function
first：
$h_\theta(x)=P(y=1|x;\theta) =1-P(y=0|x;\theta) \ \ --estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta$
$h_\theta(X)=g(z)=\frac{1}{1+e^{-z}}\quad\quad z=\theta ^TX$
so we can get the general formula:
$P(y|x;\theta)=g(z)^y(1-g(z))^{(1-y)}\qquad (y={0,1})----(1)$
then use the Maximum likelihood estimation(MLE):
note that $L(\theta)=\prod_{i=1}^nP(y^{(i)}|x^{(i)};\theta)----(2)$
substitute equation (1) into equation (2):
$L(\theta)=\prod_{i=1}^ng(z^{(i)})^{y{(i)}}(1-g(z^{(i)}))^{1-y^{(i)}}----(3)$

on equation (3) on both sides of the natural logarithm:
$ln(L(\theta))=\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(4)$

we know that MLE's goal is to get the best $\theta$ that makes equation (4) max, so we let
$J(\theta)=-\frac{1}{m}ln(L(\theta))=-\frac{1}{m}\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(5)$

next, we will to get the deviation $\frac{\partial J}{\partial \theta_j}$ :
$\begin{aligned} \\&\frac{\partial J}{\partial \theta_j}=-\sum_{i=1}^n\big[y^{(i)}\frac{1}{g(z^{(i)})}+(1-y^{(i)})\frac{-1}{1-g(z^{(i)})}\big]\frac{\partial g(z^{(i)})}{\partial \theta_j} \\& \\&=-\sum_{i=1}^n\big[\frac{y^{(i)}}{g(z^{(i)})}-\frac{(1-y^{(i)})}{1-g(z^{(i)})}\big]g(z^{(i)})(1-g(z^{(i)}))x_j^{(i)} \\&=-\sum_{i=1}^n\big[y^{(i)}(1-g(z^{(i)}))-(1-y^{(i)})g(z^{(i)})\big]x_j^{(i)} \end{aligned}$
so further,we can calculate :
$\theta_j:=\theta_j-\alpha\frac{\partial J}{\partial \theta_j}$