学习小组Day6笔记--pangguanchao
2018-10-13 本文已影响13人
0fdc0b2caa52
一、思维导图
data:image/s3,"s3://crabby-images/4a843/4a843259b0a19f0bffb378ca9b3c5d9c32086383" alt=""
二、准备工作
1. 获取R包小抄(cheatsheet)
(1) 百度、谷歌搜索
(2) Rstudio的cheatsheet网站
data:image/s3,"s3://crabby-images/6a71a/6a71a837876791e1d4058f36b0b7243312ef6277" alt=""
(3) 生信星球公众号回复包名
2. tidyr功能
将要用的数据处理成标准而统一的数据框(Tidy Data)
(1) 数据框的变形
(2) 处理数据框中的空值
(3) 根据一个表格衍生出其他表格
(4) 实现行或列的分割和合并
3. 安装R包 install.packages("tidyr")
library(tidyr)
data:image/s3,"s3://crabby-images/849a7/849a773f2921c4c4ec6c7a8aaa44581d87f2c8a8" alt=""
4. 数据框
rep
重复,括号中填要重复的字符和重复次数
paste
连接两个字符串, 括号要填两个代连接字符并指定分隔符sep
, 没有分隔符就填sep=“”
1:3
表示从1到3, 如需一列中需要填入三个无规律的数字, 可以用向量c(1,3,4)
, 同样如果填的是字符串也需要加双引号, 例如c("doudou","huahua","xiaoyu")
data:image/s3,"s3://crabby-images/373f6/373f6e7b908a7d039c50095862d94bfe69d55f35" alt=""
5. Tidy Data
一种组织表格数据的方式,提供了一种能够跨包使用的“统一”的数据格式
每个变量(variable)占一列,每个情况(case)和观测值(observation)占一行
image.png
三、tidyr
1. Reshape Data
a<-data.frame(GeneId = rep("gene5",times=3),SampleName =paste("Sample",1:3,sep=""),Expression=c(14,19,18))
gather(a,X1999,X2000,key = "year",value = "cases")
data:image/s3,"s3://crabby-images/b02d8/b02d829f7d8e225e7ac680152a6b3f3abb497d3d" alt=""
a<-data.frame(GeneId = rep("gene5",times=3),SampleName =paste("Sample",1:3,sep=""),Expression=c(14,19,18))
gather(a,"year","cases",X1999,X2000)
data:image/s3,"s3://crabby-images/47fd8/47fd81b35e06616fa6018d62f2bbe841e4188d2f" alt=""
a<-data.frame(GeneId = rep("gene5",times=3),SampleName =paste("Sample",1:3,sep=""),Expression=c(14,19,18))
gather(a,year,cases,-country) #-country的意思就是合并除country外剩下的列
data:image/s3,"s3://crabby-images/90048/90048e969c81794a02ba7001a713779d0f387e10" alt=""
a<-data.frame(GeneId = rep("gene5",times=3),SampleName =paste("Sample",1:3,sep=""),Expression=c(14,19,18))
b <- gather(a,X1999,X2000,key = "year",value = "cases")
spread(b,"year","cases")
data:image/s3,"s3://crabby-images/0222e/0222ef5a4322b120caa41aa9ccf8bdacc41fd629" alt=""
2. Handle Missing Values
drop_na() #有空值的, 整行删除掉
data:image/s3,"s3://crabby-images/b31ee/b31eeb569aecce3c409bc7cafb0609d38b916202" alt=""
fill() #根据上一行的数值填充上
data:image/s3,"s3://crabby-images/ac0ac/ac0acfda99ceb2b03f265c6804033bce415527bb" alt=""
replace_na() #空值填进去特定的一个数值,括号里填数据框名,要填的列名=要填的值
data:image/s3,"s3://crabby-images/79aa3/79aa3a6c43e2f62a004d2220a33c05829429e1e9" alt=""
3. Expand Tables
complete(X,nesting(X1),fill=list(X2=5)) #把空值的位置用5补全
data:image/s3,"s3://crabby-images/e1079/e107936cfd3a0110a2855cce67d0341e3735aace" alt=""
complete(X,nesting(X1),fill=list(X2="add relate")) #把空值的位置用add relate补全
data:image/s3,"s3://crabby-images/41c56/41c56320f2331464ee13624d3756477778fea750" alt=""
pin2<-data.frame(GeneId = rep("gene5",times=3),SampleName =paste("Sample",1:3,sep=""),Expression=c(14,19,18))
expand(pin2,GeneId,SampleName,Expression) #列出每列值所有可能的组合
data:image/s3,"s3://crabby-images/5db1a/5db1a876aa54995f01ffa316fb3f3dcb80a8197a" alt=""
4. split cells
separate
按列分割, 把一列拆成两列
table3 <- read.csv('table3.txt')
separate(table3,rate,sep="/",into = c("case","pop"))
data:image/s3,"s3://crabby-images/6e74e/6e74edab1a1c062ae16215b622fbaf4aafd12ea6" alt=""
separate_rows
按行分割,把一列拆成两行
table3 <- read.csv('table3.txt')
separate_rows(table3,rate,sep="/")
data:image/s3,"s3://crabby-images/60d4d/60d4d30fdf667a9271e795979c5ead844ec7e388" alt=""
unite
分割完了再合并回去
table5 <- read.csv("table5.txt")
unite(table5,century,year,col="year",sep="")
data:image/s3,"s3://crabby-images/9769a/9769ae30c737df8b14d3a07ea69fbc0f5ddc523e" alt=""