用dplyr包进行数据清理-数据集的纵向合并

2019-11-16 本文已影响0人新云旧雨

笔记说明

dplyr包是一个用于数据清理的高效r包，也是tidyverse的核心包之一。
之前的笔记：
用dplyr包进行数据清理-数据集的横向合并
介绍了用于两个数据集横向合并的各种*_join()函数，
本次笔记介绍用于两个数据集纵向合并的各种Set operations

set operations 介绍

使用set operations对数据集进行纵向合并时,被合并的数据集的变量一般是相同的。
常用的set operations包括：

intersect(x, y): return only observations in both x and y
union(x, y): return unique observations in x and y
union_all(x, y): return all observations in x and y
setdiff(x, y): return observations in x, but not in y
setequal(x,y):test whether two data sets contain the exact same rows (in any order).

注意：intersect(), union() , setdiff()会删掉重复的观测（行）。
下面对各set operations进行演示。

准备工作

加载dplyr包并构造两个数据集x,y

library(dplyr)
x <- tribble(
  ~id, ~value,
  1, 'a',
  2, 'b',
  2, 'b',
  3, 'c'
)
y <- tribble(
  ~id, ~value,
  2, 'b',
  3, 'c',
  4, 'd'
)

intersect()

x %>% intersect(y)

## # A tibble: 2 x 2
##      id value
##   <dbl> <chr>
## 1     2 b    
## 2     3 c

union()

x %>% union(y)

## # A tibble: 4 x 2
##      id value
##   <dbl> <chr>
## 1     1 a    
## 2     2 b    
## 3     3 c    
## 4     4 d

union_all()

x %>% union_all(y)

## # A tibble: 7 x 2
##      id value
##   <dbl> <chr>
## 1     1 a    
## 2     2 b    
## 3     2 b    
## 4     3 c    
## 5     2 b    
## 6     3 c    
## 7     4 d

setdiff()

x %>% setdiff(y)

## # A tibble: 1 x 2
##      id value
##   <dbl> <chr>
## 1     1 a

setequal()

x %>% setequal(y)

[1] FALSE