R 语言 生信分析

[读书笔记r4ds]16.Dates and times

2020-03-24  本文已影响0人  茶思饭

在线读书:
R for data science
github地址: https://github.com/hadley/r4ds

16. Dates and Times

library(lubridate)

16.2 Creating date/times

3种date/times格式:

You should always use the simplest possible data type that works for your needs.
today() ## date of today
now() ## date-time of now
3 种需要使用date/time的途径;

16.2.1 From strings

通过y年,m月,d日的不同顺序组合来创制

ymd("2017-01-31") ## 年月日
#> [1] "2017-01-31"
mdy("January 31st, 2017")## 月日年
#> [1] "2017-01-31"
dmy("31-Jan-2017") ## 日月年
#> [1] "2017-01-31"
ymd(20170131) ##也可以识别非字符串形式

也可以创制date-time

ymd_hms("2017-01-31 20:11:59")
#> [1] "2017-01-31 20:11:59 UTC"
mdy_hm("01/31/2017 08:01")
#> [1] "2017-01-31 08:01:00 UTC"

16.2.2 From individual components

对于分散在不同列中的组成部分使用
make_date() for dates, or make_datetime() for date-times:

flights %>% 
  select(year, month, day, hour, minute) %>% 
  mutate(departure = make_datetime(year, month, day, hour, minute))
#> # A tibble: 336,776 x 6
#>    year month   day  hour minute departure          
#>   <int> <int> <int> <dbl>  <dbl> <dttm>             
#> 1  2013     1     1     5     15 2013-01-01 05:15:00
#> 2  2013     1     1     5     29 2013-01-01 05:29:00
#> 3  2013     1     1     5     40 2013-01-01 05:40:00
#> 4  2013     1     1     5     45 2013-01-01 05:45:00
#> 5  2013     1     1     6      0 2013-01-01 06:00:00
#> 6  2013     1     1     5     58 2013-01-01 05:58:00
#> # … with 3.368e+05 more rows

-date-times in a numeric context (like in a histogram), 1 means 1 second, so a binwidth of 86400 means one day. For dates, 1 means 1 day.

16.2.3 From other types

as_datetime(today())
#> [1] "2019-01-08 UTC"
as_date(now())
#> [1] "2019-01-08"
as_datetime(60 * 60 * 10)
#> [1] "1970-01-01 10:00:00 UTC"
as_date(365 * 10 + 2)
#> [1] "1980-01-01"

16.3 Date-time components

16.3.2 Rounding 近似

16.3.3 Setting components

16.4 Time spans

16.4.1 Durations

16.4.2 Periods 周期

以下函数将创制周期为单位的数据,同类型数据可以相加减:
days() # 1天
seconds(15) ## 15s
minutes(10) ## 10min
hours(12) ## 12 hour
months()## 月
weeks() ## 周
years()## 年

16.4.3 Intervals

16.5 Time zones

16.3.4 Exercises

  1. How does the distribution of flight times within a day change over the course of the year?
flights_dt %>% 
  mutate(dep_time=update(dep_time,year = 2020, month = 2, mday = 2)) %>% 
  ggplot(aes(dep_time))+geom_freqpoly(binwidth = 3600)
  1. Compare dep_time, sched_dep_time and dep_delay. Are they consistent? Explain your findings.
flights_dt %>% 
  mutate(delay=(dep_time-sched_dep_time)) %>% 
  select(tailnum,dep_time,sched_dep_time,dep_delay,delay) 
  1. Compare air_time with the duration between the departure and arrival. Explain your findings. (Hint: consider the location of the airport.)
flights %>% select(air_time,distance) %>% 
  ggplot(aes(distance,air_time))+geom_point()
  1. How does the average delay time change over the course of a day? Should you use dep_time or sched_dep_time? Why?
##
flights_dt %>% mutate(dep_time=update(dep_time,year=2013,month=1,mday=1))%>% 
  group_by(dep_time) %>% 
  summarise(mean=mean(dep_delay)) %>% 
  ggplot(aes(dep_time,mean))+geom_line()
flights_dt %>% mutate(sched_dep_time=update(sched_dep_time,year=2013,month=1,mday=1))%>% 
  group_by(sched_dep_time) %>% 
  summarise(mean_delay=mean(dep_delay)) %>% 
  ggplot(aes(sched_dep_time,mean_delay))+geom_point()+geom_smooth()
  1. On what day of the week should you leave if you want to minimise the chance of a delay?
flights_dt %>% mutate(weekday=wday(sched_dep_time)) %>% 
  group_by(weekday) %>% 
  summarise(mean=mean(dep_delay)) %>% 
  ggplot(aes(weekday,mean))+geom_line()
  1. What makes the distribution of diamonds$carat and flights$sched_dep_time similar?
    by human judgement
ggplot(diamonds,aes(carat))+geom_freqpoly()
sched_dep <- flights_dt %>% 
    mutate(minute = minute(sched_dep_time)) %>% 
    group_by(minute) %>% 
    summarise(
        avg_delay = mean(arr_delay, na.rm = TRUE),
        n = n())
ggplot(sched_dep, aes(minute, n)) +
    geom_line()
  1. Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early. Hint: create a binary variable that tells you whether or not a flight was delayed.
flights_dt %>% mutate(minute=minute(sched_dep_time),is=dep_delay<0) %>% 
  group_by(minute) %>% 
  summarise(ave_delay=mean(is),n=sum(is)/n()) %>% 
  ggplot(aes(minute,n))+geom_line()
上一篇 下一篇

猜你喜欢

热点阅读