数值型—因子型,怎么批量转换
2021-08-11 本文已影响0人
芋圆学徒
数据处理过程中遇到这样一个问题,数据中8—10列是分类变量,因此在处理时我需要将其转换为因子型,这里我使用了两种办法,都没有达到这个目的,正在思索原因
方法一
使用apply()函数,得到的a是个矩阵,当我将其转换为数据框时,他自动将数据转换为int整数型
> library(data.table)
> clinical <- fread("clinical.txt")
> a <- apply(clinical[,8:18],2, factor) %>% as.data.frame()
> str(a)
'data.frame': 104 obs. of 11 variables:
$ flutter: chr "0" "0" "0" "0" ...
$ cad : chr "0" "0" "0" "1" ...
$ dvt : chr "0" "0" "0" "0" ...
$ chf : chr "0" "0" "0" "0" ...
$ ild : chr "0" "0" "0" "0" ...
$ osa : chr "0" "1" "0" "0" ...
$ dm : chr "0" "1" "0" "0" ...
$ ce : chr "1" "0" "0" "0" ...
$ ost : chr "0" "0" "0" "0" ...
$ ais : chr "0" "0" "0" "0" ...
$ ca : chr "0" "0" "0" "0" ...
方法二
使用循环将每一列都转换为因子型数据,最后的结果也不理想
> clinical <- fread("clinical.txt")
> for (i in colnames(clinical)[8:18]) {
+ clinical$i <- factor(clinical$i,levels = c(0,1))
+ print(table(clinical$i))
+ }
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
0 1
101 3
> str(clinical[,8:18])
Classes ‘data.table’ and 'data.frame': 104 obs. of 11 variables:
$ flutter: int 0 0 0 0 0 0 0 0 0 0 ...
$ cad : int 0 0 0 1 0 0 0 0 1 0 ...
$ dvt : int 0 0 0 0 0 0 0 0 0 0 ...
$ chf : int 0 0 0 0 0 0 0 0 0 0 ...
$ ild : int 0 0 0 0 0 0 0 0 0 0 ...
$ osa : int 0 1 0 0 0 0 0 0 0 0 ...
$ dm : int 0 1 0 0 0 0 0 0 0 0 ...
$ ce : int 1 0 0 0 0 0 0 0 0 0 ...
$ ost : int 0 0 0 0 0 0 0 0 0 0 ...
$ ais : int 0 0 0 0 0 0 0 0 0 0 ...
$ ca : int 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, ".internal.selfref")=<externalptr>
求助,有好的办法解决吗?
发现了问题所在
使用fread读入时,data.table需要设置为F,具体原因是为啥呢,我也不是特别清楚,总之是data.table和data.frame两种数据类型不同,具体不同之处我在知乎上找的了一些解释,其中有一篇可以借鉴【数据处理】data.table包 - 知乎 (zhihu.com)
> library(data.table)
> dd <- fread("clinical.txt",data.table = F)
> for(i in 8:18){
+ dd[,i] <- factor(dd[,i],levels = c(0,1))
+ }
> str(dd[,8:18])
'data.frame': 104 obs. of 11 variables:
$ flutter: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ cad : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 2 1 ...
$ dvt : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ chf : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ild : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ osa : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
$ dm : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
$ ce : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 1 1 1 1 ...
$ ost : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ais : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ ca : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...