日常记录

用R替换数据

2019-05-14  本文已影响57人  YX_Andrew

直接上参考代码

install.packages("bitops")
install.packages("RCurl")

library("bitops")
library("RCurl")

# 输入数据
url = "https://raw.githubusercontent.com/chrisestevez/MSDA-Bridge/master/mushroom.csv"

Rdata = getURL(url)

MyData = read.csv(text = Rdata,header = FALSE,sep=",")
MyFinalData = data.frame(MyData)
samp = head(MyFinalData, n = 10)

以上我们完整的将数据导入到了内存中,为了方便展示,我截取前十个row作为例子

> samp
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1   p  x  s  n  t  p  f  c  n   k   e   e   s   s
2   e  x  s  y  t  a  f  c  b   k   e   c   s   s
3   e  b  s  w  t  l  f  c  b   n   e   c   s   s
4   p  x  y  w  t  p  f  c  n   n   e   e   s   s
5   e  x  s  g  f  n  f  w  b   k   t   e   s   s
6   e  x  y  y  t  a  f  c  b   n   e   c   s   s
7   e  b  s  w  t  a  f  c  b   g   e   c   s   s
8   e  b  y  w  t  l  f  c  b   n   e   c   s   s
9   p  x  y  w  t  p  f  c  n   p   e   e   s   s
10  e  b  s  y  t  a  f  c  b   g   e   c   s   s
   V15 V16 V17 V18 V19 V20 V21 V22 V23
1    w   w   p   w   o   p   k   s   u
2    w   w   p   w   o   p   n   n   g
3    w   w   p   w   o   p   n   n   m
4    w   w   p   w   o   p   k   s   u
5    w   w   p   w   o   e   n   a   g
6    w   w   p   w   o   p   k   n   g
7    w   w   p   w   o   p   k   n   m
8    w   w   p   w   o   p   n   s   m
9    w   w   p   w   o   p   k   v   g
10   w   w   p   w   o   p   k   s   m
我们取其中的V1, V3,V5,V7 作为子集,并且替换每列的标签

samp = subset(samp, select = c(V1,V3,V5,V9))
colnames(samp) = c("MushroomType","CapSurface","Bruises","GillSize")

输出如下:

> samp
   MushroomType CapSurface Bruises GillSize
1             p          s       t        n
2             e          s       t        b
3             e          s       t        b
4             p          y       t        n
5             e          s       f        b
6             e          y       t        b
7             e          s       t        b
8             e          y       t        b
9             p          y       t        n
10            e          s       t        b
下面我们要将其中每个cell的字母所代表的意思列出来,当然,如果用图形表示的话并不需要全部替换,但是有时候需要将表格出示。

# 替换数据
samp$MushroomType = c('p'="poisonous",'e'="edible")[ as.character(samp$MushroomType)]
samp$CapSurface = c('f'="fibrous",'g'="grooves",y='scaly','s'="smooth")[ as.character(samp$CapSurface)]
samp$Bruises = c('t'="bruises",'f'="no")[ as.character(samp$Bruises)]
samp$GillSize = c('b'="broad",'n'="narrow")[ as.character(samp$GillSize)]

最终输出结果如下:

> samp
   MushroomType CapSurface Bruises GillSize
1     poisonous     smooth bruises   narrow
2        edible     smooth bruises    broad
3        edible     smooth bruises    broad
4     poisonous      scaly bruises   narrow
5        edible     smooth      no    broad
6        edible      scaly bruises    broad
7        edible     smooth bruises    broad
8        edible      scaly bruises    broad
9     poisonous      scaly bruises   narrow
10       edible     smooth bruises    broad
上一篇 下一篇

猜你喜欢

热点阅读