简书付费文章

R语言高级方法进行缺失数据多重插补案例演示

2020-03-19  本文已影响0人  医科研

当我们在数据集中缺少值时,重要的是考虑为什么它们会丢失以及它们对分析的影响。有时忽略丢失的数据会降低功耗,但更重要的是,有时它会使答案有偏差,并有可能误导错误的结论。因此,重要的是要考虑丢失的数据机制是什么,以便对其进行处理。 Rubin(1976)区分了三种类型的误报机制:

缺失数据的常见处理方法

实际数据操作

# required libraries
library(mice)
## Warning: package 'mice' was built under R version 3.6.3
## 
## Attaching package: 'mice'
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
library(VIM)
## Warning: package 'VIM' was built under R version 3.6.3
## Loading required package: colorspace
## Loading required package: grid
## Loading required package: data.table
## VIM is ready to use. 
##  Since version 4.0.0 the GUI is in its own package VIMGUI.
## 
##           Please use the package to use the new (and old) GUI.
## Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
library(lattice)

载入数据

# load data
data(nhanes2)
dim(nhanes2)
## [1] 25  4
head(nhanes2)
##     age  bmi  hyp chl
## 1 20-39   NA <NA>  NA
## 2 40-59 22.7   no 187
## 3 20-39   NA   no 187
## 4 60-99   NA <NA>  NA
## 5 20-39 20.4   no 113
## 6 60-99   NA <NA> 184

md.pattern可视化缺失模式

md.pattern(nhanes2) 
image.png
##    age hyp bmi chl   
## 13   1   1   1   1  0
## 3    1   1   1   0  1
## 1    1   1   0   1  1
## 1    1   0   0   1  2
## 7    1   0   0   0  3
##      0   8   9  10 27

VIM包对缺失数据可视化

library(VIM)
nhanes2_aggr = aggr(nhanes2,
                    col=mdc(1:2), # 颜色设置
                    numbers=TRUE, 
                    sortVars=TRUE, 
                    labels=names(nhanes2), 
                    cex.axis=.7, gap=3, 
                    ylab=c("Proportion of missingness","Missingness Pattern"))
image.png
上一篇 下一篇

猜你喜欢

热点阅读