R语言逻辑回归logistic regression

2022-01-08 本文已影响0人 Cache_wood

@[toc]

普通OLS回归

普通OLS回归：对回归模型中的自变量、回归系数以及残差项的取值都没有任何限制，作为自变量函数的因变量就必须能够在 $（-\infty,+\infty）$ 范围内自由取值。

如果因变量只取分类值，或者只取两类值（0、1），就会严重违反因变量为连续型变量的假设。

设：因变量 $y_i$ 只取0、1两个数值的虚拟变量，是一个两点分布变量。在给定的条件下，记概率为：
$P(y_i=1|x_i) = p_i\\ P(y_i=0|x_i) = 1-p_i = q_i\\ E(y_i|x_i) = 1\times p_i + 0\times (1-p_i) = p_i$
线性回归：
$E(y_i|x_i) = \beta_0 + \beta_1 \times x_i$

logistic回归模型

定义 $Logit(p_i) = In\frac{p_i}{1-p_i}$
设： $Logit(p_i) = \beta_0 + \beta_1 \times x_i + \varepsilon_i$

极大似然估计：
$b_0 \rightarrow \beta_0, b_1 \rightarrow \beta_1\\ \hat{p}_i = \frac{exp(b_0+b_1x_1)}{1+exp(b_0+b_1x_i)} \in [0,1]$

-2对数似然值 -2InL
该报告值越小，说明似然函数值越大，从而模型拟合程度越好

拟合优度

伪 $R^2$ （Pseudo R Square）
与R2类似，但是小于1
调整系数

回归系数的显著性检验 Wald统计量

示例代码

data <- read.csv(file = file.choose(),header = TRUE)

##maximal model
model01<- glm(Dative~ReciAnim+ReciAcc+ThemeAcc+ReciPron+ThemePron,data = data,family=binomial)
summary(model01)

step(model01)

> summary(model01)

Call:
glm(formula = Dative ~ ReciAnim + ReciAcc + ThemeAcc + ReciPron + 
    ThemePron, family = binomial, data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1900  -0.2509  -0.1634  -0.1634   2.5217  

Coefficients:
              Estimate Std. Error z value
(Intercept)    -1.0512     0.7692  -1.367
ReciAniminani   1.1726     0.4411   2.659
ReciAccunacc    2.1813     0.4529   4.817
ThemeAccunacc  -0.8667     0.6585  -1.316
ReciPronpron   -2.3916     0.6861  -3.486
ThemePronpron   3.3643     0.9441   3.564
              Pr(>|z|)    
(Intercept)   0.171703    
ReciAniminani 0.007848 ** 
ReciAccunacc  1.46e-06 ***
ThemeAccunacc 0.188122    
ReciPronpron  0.000491 ***
ThemePronpron 0.000366 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 328.32  on 299  degrees of freedom
Residual deviance: 170.98  on 294  degrees of freedom
AIC: 182.98

Number of Fisher Scoring iterations: 6

变量ThemeAccunacc没有通过检验，使用step步进算法进行排除。

AIC：赤池信息准则，衡量统计模型拟合优良性(Goodness of fit)的一种标准。它的假设条件是模型的误差服从独立正态分布。其中：k是所拟合模型中参数的数量，L是对数似然值,n是观测值数目。

一般情况下，AIC可以表示为 $AIC = 2k-2ln(L)$

> step(model01)
Start:  AIC=182.98
Dative ~ ReciAnim + ReciAcc + ThemeAcc + ReciPron + ThemePron

            Df Deviance    AIC
- ThemeAcc   1   172.82 182.82
<none>           170.98 182.98
- ReciAnim   1   178.36 188.36
- ThemePron  1   183.77 193.77
- ReciPron   1   186.52 196.52
- ReciAcc    1   198.01 208.01

Step:  AIC=182.82
Dative ~ ReciAnim + ReciAcc + ReciPron + ThemePron

            Df Deviance    AIC
<none>           172.82 182.82
- ReciAnim   1   180.51 188.51
- ReciPron   1   187.79 195.79
- ThemePron  1   198.25 206.25
- ReciAcc    1   203.52 211.52

Call:  glm(formula = Dative ~ ReciAnim + ReciAcc + ReciPron + ThemePron, 
    family = binomial, data = data)

Coefficients:
  (Intercept)  ReciAniminani   ReciAccunacc  
       -1.911          1.187          2.288  
 ReciPronpron  ThemePronpron  
       -2.337          3.949  

Degrees of Freedom: 299 Total (i.e. Null);  295 Residual
Null Deviance:      328.3 
Residual Deviance: 172.8    AIC: 182.8

R语言逻辑回归logistic regression

普通OLS回归

logistic回归模型

拟合优度

示例代码

猜你喜欢

热点阅读

R语言 逻辑回归logistic regression

普通OLS回归

logistic回归模型

拟合优度

示例代码

猜你喜欢

热点阅读

R语言逻辑回归logistic regression