R语言数据处理

数值型—因子型,怎么批量转换

2021-08-11  本文已影响0人  芋圆学徒

数据处理过程中遇到这样一个问题,数据中8—10列是分类变量,因此在处理时我需要将其转换为因子型,这里我使用了两种办法,都没有达到这个目的,正在思索原因

方法一

使用apply()函数,得到的a是个矩阵,当我将其转换为数据框时,他自动将数据转换为int整数型

> library(data.table)
> clinical <- fread("clinical.txt")
> a <- apply(clinical[,8:18],2, factor) %>% as.data.frame()
> str(a)
'data.frame':   104 obs. of  11 variables:
 $ flutter: chr  "0" "0" "0" "0" ...
 $ cad    : chr  "0" "0" "0" "1" ...
 $ dvt    : chr  "0" "0" "0" "0" ...
 $ chf    : chr  "0" "0" "0" "0" ...
 $ ild    : chr  "0" "0" "0" "0" ...
 $ osa    : chr  "0" "1" "0" "0" ...
 $ dm     : chr  "0" "1" "0" "0" ...
 $ ce     : chr  "1" "0" "0" "0" ...
 $ ost    : chr  "0" "0" "0" "0" ...
 $ ais    : chr  "0" "0" "0" "0" ...
 $ ca     : chr  "0" "0" "0" "0" ...

方法二

使用循环将每一列都转换为因子型数据,最后的结果也不理想

> clinical <- fread("clinical.txt")
> for (i in colnames(clinical)[8:18]) {
+   clinical$i <- factor(clinical$i,levels = c(0,1))
+   print(table(clinical$i))
+ }

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 

  0   1 
101   3 
> str(clinical[,8:18])
Classes ‘data.table’ and 'data.frame':  104 obs. of  11 variables:
 $ flutter: int  0 0 0 0 0 0 0 0 0 0 ...
 $ cad    : int  0 0 0 1 0 0 0 0 1 0 ...
 $ dvt    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ chf    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ild    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ osa    : int  0 1 0 0 0 0 0 0 0 0 ...
 $ dm     : int  0 1 0 0 0 0 0 0 0 0 ...
 $ ce     : int  1 0 0 0 0 0 0 0 0 0 ...
 $ ost    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ais    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ ca     : int  0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, ".internal.selfref")=<externalptr> 

求助,有好的办法解决吗?



发现了问题所在

使用fread读入时,data.table需要设置为F,具体原因是为啥呢,我也不是特别清楚,总之是data.table和data.frame两种数据类型不同,具体不同之处我在知乎上找的了一些解释,其中有一篇可以借鉴【数据处理】data.table包 - 知乎 (zhihu.com)
> library(data.table)
> dd <- fread("clinical.txt",data.table = F)
> for(i in 8:18){
+   dd[,i] <- factor(dd[,i],levels = c(0,1))
+ }
> str(dd[,8:18])
'data.frame':   104 obs. of  11 variables:
 $ flutter: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ cad    : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 2 1 ...
 $ dvt    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ chf    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ild    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ osa    : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
 $ dm     : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
 $ ce     : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 1 1 1 1 ...
 $ ost    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ais    : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ca     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
上一篇 下一篇

猜你喜欢

热点阅读