Day6--R:::dplyr

2020-04-01 本文已影响0人东方不赞

1.dplyr的学习

rm(list = ls())
library(dplyr)
head(iris)
learn=iris[c(1,2,51,52,101,102),]
learn
class(learn) #data.frame
attach(learn)

其实不需要用attach()和detach()函数，但是在下懒惰，喜欢用r的智能匹配去找列名，懒得敲字

mutate()函数

按行计算并输入到新列

按行计算Sepal的length*width的积，并存储到Sepal.area

默认新列名是计算公式

mutate(learn,Sepal.area=Sepal.Length * Sepal.Width)

select()函数

选择列

单列提取

select(learn,1)

多列提取，下面几种方法效果等价

select(learn,c(1,3))
select(learn,1,3)
select(learn,Sepal.Length,Petal.Length)
select(learn,c(Sepal.Length,Petal.Length))
select(learn,one_of(c("Sepal.Length","Petal.Length"))) #不懂one_of 是干嘛的

data.frame的角标提出不服

learn[,c(1,3)]
learn[,c("Sepal.Length","Petal.Length")]

filter()函数

选择行

筛选条件

filter(learn,Species=="setosa")
filter(learn,Species=="setosa" & Sepal.Length>5)
filter(learn,Species %in% c("setosa","versicolor"))

data.frame 的角标照样可以

learn[which(Species=="setosa"),]
learn[which(Species=="setosa" & Sepal.Length>5),]
learn[which(Species %in% c("setosa","versicolor")),]

加上which，data.frame 的角标还是可以

learn[which(Species=="setosa"),]
learn[which(Species=="setosa" & Sepal.Length>5),]
learn[which(Species %in% c("setosa","versicolor")),]

arrange()函数

排序

从小到大

arrange(learn,Sepal.Length)

从大到小,两种方法等价

arrange(learn,-Sepal.Length)
arrange(learn,desc(Sepal.Length))

data.frame又跳出来啦

# 从小到大
learn[order(Sepal.Length),]
#从大到小
learn[order(-Sepal.Length),]

summarise()函数

汇总

整体汇总计算

summarise(learn,mean(Sepal.Length),sd(Sepal.Length))

分组汇总计算

summarise(group_by(learn,Species),mean(Sepal.Length),sd(Sepal.Length))

这是给data.frame 留的:)

管道操作符 %>%

快捷键ctrl+shift+m

learn %>% group_by(Species) %>% summarise(mean(Sepal.Length),sd(Sepal.Length))

统计列数

count(learn,Sepal.Length)
table(learn$Sepal.Length)

## 所有列
table(learn)
## 计算百分比
prop.table(learn$Sepal.Length)

2.dplyr处理关系数据

表的连接是SQL中的必修，但是在R中也是可以使用对应的函数执行相应的功能。

inner_join内联

取交集

inner_join(table1,table2,by=attribute.name)

左连left_join

以table1为主进行连接

left_join(table1, table2, by=attribute.name)

全连full_join

没有为NA

full_join(table1, table2, by=attribute.name)

半连接semi_join

返回能够与y表匹配的x表所有记录semi_join

semi_join(x=table1, y=table2, by=attribute.name)

反连接anti_join：返回无法与y表匹配的x表的所记录anti_join

anti_join(x=table1,y=table2, by=attribute.name)

表的合并

# 列的合并
bind_cols(table1,table2)
##or 
cbind(table1,table2)

#行的合并
bind_rows(table1,table2)
##or
rbind(table1,table2)

3.字符串处理函数

参照 https://www.jianshu.com/p/2ddaa6d06008

字符串长度

nchar("asd") # 3

字符替换

# sub 只替换匹配到的第一个
sub(pattern = "b", replacement = "B", x = c("abcb", "baby")) #c("aBcb" "Baby")
# gsub和chartr替换所有匹配到的
gsub(pattern = "b", replacement = "B", x = c("abcb", "boy",)) #c("aBcb" "Baby")
chartr(old="a", new="A", x="data") #data-->dAtA

字符串截取

substr(x="asdf",start=1,stop=3)#"asd"

字符串分割(返回列表)

# 字符串分割
strsplit("abc", split = "")# list(c("a","b","c"))
# 字符串向量分割
strsplit(c("abc","123"), split = "")# list(c("a","b","c"),c("1","2","3"))

字符串合并

paste("aa","bb","cc")#"aa bb cc"
paste("aa","bb","cc",sep="$$")#"aa$$bb$$cc"
paste("a",1:3,sep="")#c("a1","a2","a3")
paste0("a", 1:3, sep = "",collapse = "-")#"a1-a2-a3"

字符匹配

# 返回匹配到的元素的索引
grep(pattern = "boy", x = c("abcb", "boy", "baby")) #2
# 返回逻辑值
grepl(pattern = "boy", x = c("abcb", "boy", "baby")) #FALSE  TRUE FALSE

Day6--R:::dplyr

1.dplyr的学习

mutate()函数

select()函数

filter()函数

arrange()函数

summarise()函数

统计列数

2.dplyr处理关系数据

3.字符串处理函数

猜你喜欢

热点阅读