公众号-科研私家菜学习记录(1)

2021-08-04  本文已影响0人  明眸意海

广义线性模型

Logistic回归分析

  1. 条件Logistic回归:配对或配伍设计资料
  2. 非条件Logistic回归: 适用于成组设计的统计资料
  3. 因变量可以是:两项分类,无序多项分类,有序多项分类等
  1. 数据准备
data(Affairs, package="AER")
summary(Affairs)
table(Affairs$affairs)

# create binary outcome variable
Affairs$ynaffair[Affairs$affairs > 0] <- 1
Affairs$ynaffair[Affairs$affairs == 0] <- 0
Affairs$ynaffair <- factor(Affairs$ynaffair, 
                           levels=c(0,1),
                           labels=c("No","Yes"))
table(Affairs$ynaffair)
  1. 筛选变量
fit.full <- glm(ynaffair ~ gender + age + yearsmarried + children + 
                  religiousness + education + occupation +rating,
                data=Affairs,family=binomial())
  1. 查看结果
summary(fit.full)
# Call:
#   glm(formula = ynaffair ~ gender + age + yearsmarried + children + 
#         religiousness + education + occupation + rating, family = binomial(), 
#       data = Affairs)
# 
# Deviance Residuals: 
#   Min       1Q   Median       3Q      Max  
# -1.5713  -0.7499  -0.5690  -0.2539   2.5191  
# 
# Coefficients:
#   Estimate Std. Error z value Pr(>|z|)    
# (Intercept)    1.37726    0.88776   1.551 0.120807    
# gendermale     0.28029    0.23909   1.172 0.241083    
# age           -0.04426    0.01825  -2.425 0.015301 *  
#   yearsmarried   0.09477    0.03221   2.942 0.003262 ** 
#   childrenyes    0.39767    0.29151   1.364 0.172508    
# religiousness -0.32472    0.08975  -3.618 0.000297 ***
#   education      0.02105    0.05051   0.417 0.676851    
# occupation     0.03092    0.07178   0.431 0.666630    
# rating        -0.46845    0.09091  -5.153 2.56e-07 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
# Null deviance: 675.38  on 600  degrees of freedom
# Residual deviance: 609.51  on 592  degrees of freedom
# AIC: 627.51
# 
# Number of Fisher Scoring iterations: 4
  1. 有统计学意义的变量有以下4个:age + yearsmarried + religiousness + rating 挑选这四个变量继续做logistic回归分析:
fit.reduced <- glm(ynaffair ~ age + yearsmarried + religiousness + 
                     rating, data=Affairs, family=binomial())
summary(fit.reduced)
  1. 两个模型比较:结果显示P值不显著,没有统计学差异,说明两个模型评价效果差异不大
# compare models
anova(fit.reduced, fit.full, test="Chisq")
# Analysis of Deviance Table
# 
# Model 1: ynaffair ~ age + yearsmarried + religiousness + rating
# Model 2: ynaffair ~ gender + age + yearsmarried + children + religiousness + 
#   education + occupation + rating
# Resid. Df Resid. Dev Df Deviance Pr(>Chi)
# 1       596     615.36                     
# 2       592     609.51  4   5.8474   0.2108
  1. 亚组分析:查看不同评分的风险分析,相当于亚组分析。其他因素取均数,只计算评分的风险分别情况
# calculate probability of extramariatal affair by marital ratings
testdata <- data.frame(rating = c(1, 2, 3, 4, 5),
                       age = mean(Affairs$age),
                       yearsmarried = mean(Affairs$yearsmarried),
                       religiousness = mean(Affairs$religiousness))
testdata$prob <- predict(fit.reduced, newdata=testdata, type="response")
testdata
# rating      age yearsmarried religiousness      prob
# 1      1 32.48752     8.177696      3.116473 0.5302296
# 2      2 32.48752     8.177696      3.116473 0.4157377
# 3      3 32.48752     8.177696      3.116473 0.3096712
# 4      4 32.48752     8.177696      3.116473 0.2204547
# 5      5 32.48752     8.177696      3.116473 0.1513079
  1. 过度离势分析
# evaluate overdispersion
fit <- glm(ynaffair ~ age + yearsmarried + religiousness +
             rating, family = binomial(), data = Affairs)
fit.od <- glm(ynaffair ~ age + yearsmarried + religiousness +
                rating, family = quasibinomial(), data = Affairs)
pchisq(summary(fit.od)$dispersion * fit$df.residual,  
       fit$df.residual, lower = F)
# 0.340122

两模型比较后计算P值>0.05,说明不存在过度离势。

上一篇下一篇

猜你喜欢

热点阅读