生信星球培训第三十期

Day-6 骆栢维

2019-11-27  本文已影响0人  Norville

R包

安装加载R包

> options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/")) 
> 
> options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/") 
> 
> install.packages("dplyr")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
trying URL 'https://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/windows/contrib/3.6/dplyr_0.8.3.zip'
Content type 'application/zip' length 3266767 bytes (3.1 MB)
downloaded 3.1 MB

package ‘dplyr’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\luobo\AppData\Local\Temp\RtmpMJbEb9\downloaded_packages
> library(dplyr)

载入程辑包:‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Warning message:
程辑包‘dplyr’是用R版本3.6.1 来建造的 
> 
> library(dplyr)#载入R包,便于后面函数使用

dplyr五个基本函数

新增列mutate()

> mutate(test, new = Sepal.Length * Sepal.Width)
  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          5.1         3.5          1.4         0.2     setosa
2          4.9         3.0          1.4         0.2     setosa
3          7.0         3.2          4.7         1.4 versicolor
4          6.4         3.2          4.5         1.5 versicolor
5          6.3         3.3          6.0         2.5  virginica
6          5.8         2.7          5.1         1.9  virginica
    new
1 17.85
2 14.70
3 22.40
4 20.48
5 20.79
6 15.66

按列筛选

按列号

> select(test,c(2,4))
    Sepal.Width Petal.Width
1           3.5         0.2
2           3.0         0.2
51          3.2         1.4
52          3.2         1.5
101         3.3         2.5
102         2.7         1.9

按名

> select(test,Petal.Length)
    Petal.Length
1            1.4
2            1.4
51           4.7
52           4.5
101          6.0
102          5.1

筛选行

> filter(test, Species == "setosa"&Sepal.Length < 5 )
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          4.9           3          1.4         0.2  setosa

排序

> arrange(test, Sepal.Width)# 默认升序
  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          5.8         2.7          5.1         1.9  virginica
2          4.9         3.0          1.4         0.2     setosa
3          7.0         3.2          4.7         1.4 versicolor
4          6.4         3.2          4.5         1.5 versicolor
5          6.3         3.3          6.0         2.5  virginica
6          5.1         3.5          1.4         0.2     setosa
> arrange(test, desc(Sepal.Width))# 降序
  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          5.1         3.5          1.4         0.2     setosa
2          6.3         3.3          6.0         2.5  virginica
3          7.0         3.2          4.7         1.4 versicolor
4          6.4         3.2          4.5         1.5 versicolor
5          4.9         3.0          1.4         0.2     setosa
6          5.8         2.7          5.1         1.9  virginica

汇总

> group_by(test,Species)
# A tibble: 6 x 5
# Groups:   Species [3]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
*        <dbl>       <dbl>        <dbl>       <dbl> <fct>     
1          5.1         3.5          1.4         0.2 setosa    
2          4.9         3            1.4         0.2 setosa    
3          7           3.2          4.7         1.4 versicolor
4          6.4         3.2          4.5         1.5 versicolor
5          6.3         3.3          6           2.5 virginica 
6          5.8         2.7          5.1         1.9 virginica 
> summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))
# A tibble: 3 x 3
  Species    `mean(Sepal.Length)` `sd(Sepal.Length)`#mean代表平均数,sd表示标准差
  <fct>                     <dbl>              <dbl>
1 setosa                     5                 0.141
2 versicolor                 6.7               0.424
3 virginica                  6.05              0.354

dplyr实用技能

通道操作 %>%

count统计

dplyr处理关系数据

内连取交集inner_join

左连left_join

全连full_join

半连semi_join

反连anti_join

合并bind_rows()bind_cols

上一篇 下一篇

猜你喜欢

热点阅读