[15] 《R数据科学》使用arrange()排列行

2020-11-02  本文已影响0人  灰常不错

arrange()函数的工作方式与filter()函数十分相似,但前者不是选择行,而是改变行的顺序。它接受一个数据框和一组作为排序依据的列名作为参数。

文章摘要

  1. 依次按行排序
  2. 使用desc()按行降序
  3. 缺失值排序规则

依次按行排序

如果列名不止一个,那么就使用后面的列在前面排序的基础上进行排序:

arrange(flights,year,month,day)
# A tibble: 336,776 x 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
 1  2013     1     1      517            515         2      830            819        11
 2  2013     1     1      533            529         4      850            830        20
 3  2013     1     1      542            540         2      923            850        33
 4  2013     1     1      544            545        -1     1004           1022       -18
 5  2013     1     1      554            600        -6      812            837       -25
 6  2013     1     1      554            558        -4      740            728        12
 7  2013     1     1      555            600        -5      913            854        19
 8  2013     1     1      557            600        -3      709            723       -14
 9  2013     1     1      557            600        -3      838            846        -8
10  2013     1     1      558            600        -2      753            745         8
# ... with 336,766 more rows, and 10 more variables: carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#   minute <dbl>, time_hour <dttm>

使用desc()按行降序

arrange(flights,desc(arr_delay))
# A tibble: 336,776 x 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
 1  2013     1     9      641            900      1301     1242           1530      1272
 2  2013     6    15     1432           1935      1137     1607           2120      1127
 3  2013     1    10     1121           1635      1126     1239           1810      1109
 4  2013     9    20     1139           1845      1014     1457           2210      1007
 5  2013     7    22      845           1600      1005     1044           1815       989
 6  2013     4    10     1100           1900       960     1342           2211       931
 7  2013     3    17     2321            810       911      135           1020       915
 8  2013     7    22     2257            759       898      121           1026       895
 9  2013    12     5      756           1700       896     1058           2020       878
10  2013     5     3     1133           2055       878     1250           2215       875
# ... with 336,766 more rows, and 10 more variables: carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#   minute <dbl>, time_hour <dttm>

缺失值排序规则

缺失值总排在最后:

df <- tibble(x=c(5,2,NA))
arrange(df,x)
# A tibble: 3 x 1
      x
  <dbl>
1     2
2     5
3    NA

arrange(df,desc(x))
# A tibble: 3 x 1
      x
  <dbl>
1     5
2     2
3    NA

练习

(1)如何使用arrange()将缺失值排在最前面?

arrange(flights, desc(is.na(dep_time)))
# A tibble: 336,776 x 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
 1  2013     1     1       NA           1630        NA       NA           1815        NA
 2  2013     1     1       NA           1935        NA       NA           2240        NA
 3  2013     1     1       NA           1500        NA       NA           1825        NA
 4  2013     1     1       NA            600        NA       NA            901        NA
 5  2013     1     2       NA           1540        NA       NA           1747        NA
 6  2013     1     2       NA           1620        NA       NA           1746        NA
 7  2013     1     2       NA           1355        NA       NA           1459        NA
 8  2013     1     2       NA           1420        NA       NA           1644        NA
 9  2013     1     2       NA           1321        NA       NA           1536        NA
10  2013     1     2       NA           1545        NA       NA           1910        NA
# ... with 336,766 more rows, and 10 more variables: carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#   minute <dbl>, time_hour <dttm>

(2)对flights排序以找出延误时间最长的航班。找出出发时间最早的航班。

head(arrange(flights, desc(dep_delay)), 1)
# A tibble: 1 x 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
1  2013     1     9      641            900      1301     1242           1530      1272
head(arrange(flights, dep_delay), 1)
# A tibble: 1 x 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
1  2013    12     7     2040           2123       -43       40           2352        48

(3)对flight排序以找出速度最快的航班。

head(arrange(flights, desc(distance / air_time)), 1)
# A tibble: 1 x 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
1  2013     5    25     1709           1700         9     1923           1937       -14

(4)哪个航班的飞行时间最长?哪个最短?

head(arrange(flights, desc(air_time)), 1)
# A tibble: 1 x 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
1  2013     3    17     1337           1335         2     1937           1836        61
head(arrange(flights, air_time), 1)
# A tibble: 1 x 19
   year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
1  2013     1    16     1355           1315        40     1442           1411        31
上一篇下一篇

猜你喜欢

热点阅读