Series第九讲时间相关的Series

2020-09-24 本文已影响0人 butters001

Series第九讲时间相关的Series

本节课将讲解Pandas-Series中关于时间的接口。

时间相关

Series.asfreq()
Series.asof()
Series.shift()
Series.first_valid_index()
Series.last_valid_index()
Series.resample()
Series.tz_convert()
Series.tz_localize()
Series.at_time
Series.between_time()
Series.slice_shift()

详细介绍

首先导入所需依赖包

In [1]: import numpy as np                                                               
In [2]: import pandas as pd

1. `Series.asfreq()`

Series.asfreq(freq, method=None, how=None, normalize=False, fill_value=None)

将TimeSeries转换为指定的频率。

常用参数介绍：

freq：DateOffset or str 【时间频率】
method：{‘backfill’/’bfill’, ‘pad’/’ffill’}, default None 【填充NaN值的方法，注意⚠️ 这不会填充已经存在的NaN】
normalize：bool, default False 【是否将输出索引重置为午夜】
fill_value：scalar, optional 【用指定标量填充缺失值，注意⚠️ 这不会填充已经存在的NaN】

# 创建一个间隔为一分钟，长度为4的Series
In [3]: index = pd.date_range('1/1/2000', periods=4, freq='T') 
   ...: series = pd.Series([0.0, None, 2.0, 3.0], index=index) 
   ...: df = pd.DataFrame({'s':series})                                                         

In [4]: df                                                                                      
Out[4]: 
                       s
2000-01-01 00:00:00  0.0
2000-01-01 00:01:00  NaN
2000-01-01 00:02:00  2.0
2000-01-01 00:03:00  3.0

# 将Series上采样到30秒档中 新加入的索引的value为NaN
In [5]: df.asfreq(freq='30S')                                                                   
Out[5]: 
                       s
2000-01-01 00:00:00  0.0
2000-01-01 00:00:30  NaN
2000-01-01 00:01:00  NaN
2000-01-01 00:01:30  NaN
2000-01-01 00:02:00  2.0
2000-01-01 00:02:30  NaN
2000-01-01 00:03:00  3.0

# 填充新增索引的缺失值 之前存在的NaN值不变
In [6]: df.asfreq(freq='30S', fill_value=9.0)                                                   
Out[6]: 
                       s
2000-01-01 00:00:00  0.0
2000-01-01 00:00:30  9.0
2000-01-01 00:01:00  NaN
2000-01-01 00:01:30  9.0
2000-01-01 00:02:00  2.0
2000-01-01 00:02:30  9.0
2000-01-01 00:03:00  3.0

# 通过method方法填充缺失值
In [7]: df.asfreq(freq='30S', method='bfill')                                                   
Out[7]: 
                       s
2000-01-01 00:00:00  0.0
2000-01-01 00:00:30  NaN
2000-01-01 00:01:00  NaN
2000-01-01 00:01:30  2.0
2000-01-01 00:02:00  2.0
2000-01-01 00:02:30  3.0
2000-01-01 00:03:00  3.0

2. `Series.asof()`

Series.asof(where, subset=None)

最后一行不是NaN值的值。

假如我有一组数据，某个时间点的时候这个值是NaN，那就求这个值之前最近一个不是NaN的值是多少。

常用参数介绍：

where：date or array-like of dates 【时间或时间列表】
subset：str or array-like of str, default None 【对于DataFrame，如果subset不是None，则仅使用这些列检查NaN】

In [8]: s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40]) 
   ...: s                                                                                       
Out[8]: 
10    1.0
20    2.0
30    NaN
40    4.0
dtype: float64

In [9]: s.asof(20)                                                                              
Out[9]: 2.0

# 5的位置为NaN，那就找5之前最后一个不是NaN的值，前面没有，所以5对应的值为NaN
In [11]: s.asof([5, 20])                                                                        
Out[11]: 
5     NaN
20    2.0
dtype: float64

# 25的位置为NaN，那就找25之前最后一个不是NaN的值，之前最后的一个不是NaN的值是2.0
In [10]: s.asof(25)                                                                             
Out[10]: 2.0

# 对DaTaFrame
In [12]: df = pd.DataFrame({'a': [10, 20, 30, 40, 50], 
    ...:                    'b': [None, None, None, None, 500]}, 
    ...:                   index=pd.DatetimeIndex(['2018-02-27 09:01:00', 
    ...:                                           '2018-02-27 09:02:00', 
    ...:                                           '2018-02-27 09:03:00', 
    ...:                                           '2018-02-27 09:04:00', 
    ...:                                           '2018-02-27 09:05:00']))                     

In [13]: df                                                                                     
Out[13]: 
                      a      b
2018-02-27 09:01:00  10    NaN
2018-02-27 09:02:00  20    NaN
2018-02-27 09:03:00  30    NaN
2018-02-27 09:04:00  40    NaN
2018-02-27 09:05:00  50  500.0

# 考虑所有列
In [14]: df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30', 
    ...:                           '2018-02-27 09:04:30']))                                     
Out[14]: 
                      a   b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN

# 只考虑一列
In [16]: df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30', '2018-02-27 09:04:30']), subset=['a'])                                                                                     
Out[16]: 
                        a   b
2018-02-27 09:03:30  30.0 NaN
2018-02-27 09:04:30  40.0 NaN

3. `Series.shift()`

Series.shift(periods=1, freq=None, axis=0, fill_value=None)

将索引按期望的周期数移动，并带有可选的时间频率.

常用参数介绍：

periods：int 【周期，可正可负】
freq：DateOffset, tseries.offsets, timedelta, or str, optional 【频率如果传递了freq（在这种情况下，索引必须是date或datetime，否则它将引发NotImplementedError）如果指定了freq，则索引值会移位，但数据不会重新对齐。也就是说，如果您想在移位时扩展索引并保留原始数据，请使用freq。只改变索引，不改变值的绝对位置】

In [17]: df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45], 
    ...:                    "Col2": [13, 23, 18, 33, 48], 
    ...:                    "Col3": [17, 27, 22, 37, 52]}, 
    ...:                   index=pd.date_range("2020-01-01", "2020-01-05")) 
    ...: df                                                                                     
Out[17]: 
            Col1  Col2  Col3
2020-01-01    10    13    17
2020-01-02    20    23    27
2020-01-03    15    18    22
2020-01-04    30    33    37
2020-01-05    45    48    52

# 向下移动三个周期
In [18]: df.shift(periods=3)                                                                    
Out[18]: 
            Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02   NaN   NaN   NaN
2020-01-03   NaN   NaN   NaN
2020-01-04  10.0  13.0  17.0
2020-01-05  20.0  23.0  27.0

# 向右移动三个周期
In [19]: df.shift(periods=1, axis="columns")                                                    
Out[19]: 
            Col1  Col2  Col3
2020-01-01   NaN  10.0  13.0
2020-01-02   NaN  20.0  23.0
2020-01-03   NaN  15.0  18.0
2020-01-04   NaN  30.0  33.0
2020-01-05   NaN  45.0  48.0

# 填充缺失值
In [20]: df.shift(periods=3, fill_value=0)                                                      
Out[20]: 
            Col1  Col2  Col3
2020-01-01     0     0     0
2020-01-02     0     0     0
2020-01-03     0     0     0
2020-01-04    10    13    17
2020-01-05    20    23    27

# 按时间频率偏移 3天 通过freq参数来使values的绝对位置不变
In [21]: df.shift(periods=3, freq="D")                                                          
Out[21]: 
            Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52

4. `Series.first_valid_index()`

Series.first_valid_index()

返回第一个非NA/空值的索引。

In [25]: s                                                                                      
Out[25]: 
10    1.0
20    2.0
30    NaN
40    4.0
dtype: float64

In [26]: s.first_valid_index()                                                                  
Out[26]: 10

5. `Series.last_valid_index()`

Series.last_valid_index()

返回最后一个非NA/空值的索引。

In [28]: s.last_valid_index()                                                                   
Out[28]: 40

6. `Series.resample()`

Series.resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)

重新采样时间序列数据，频率转换和时间序列重采样的便捷方法。

常用参数介绍：

rule：DateOffset, Timedelta or str 【如表示目标转换的偏移量字符串或对象】
closed：{‘right’, ‘left’}, default Nonee 【计算数值时是否包含箱右侧的值】
label：{‘right’, ‘left’}, default None 【用于标记存储桶的容器边缘标签，采样集合默认用的是第一个标签，设为right后，就为最后一个标签】
convention：{‘start’, ‘end’, ‘s’, ‘e’}, default ‘start’ 【仅对于PeriodIndex，控制是否使用rule的开始或结束】
on：str, optionale 【使用列而不是索引进行重采样。列必须类似日期时间】

# 创建一个时间Series
In [29]: index = pd.date_range('1/1/2000', periods=9, freq='T') 
    ...: series = pd.Series(range(9), index=index) 
    ...: series                                                                                 
Out[29]: 
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64

# 上采样到三分钟 上采样->时间用周期中前面的时间
In [30]: series.resample('3T').sum()                                                            
Out[30]: 
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

# 下采样到三分钟 下采样->时间用周期中后面的时间
In [31]: series.resample('3T', label='right').sum()                                             
Out[31]: 
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int64

注：resample()涉及到的用法比较多，这里只介绍简单的用法，后面可能会单独开一片文章，来详细介绍resample的用法和例子。

7. `Series.tz_convert()`

Series.tz_convert(tz, axis=0, level=None, copy=True)

时区转换

常用参数介绍：

tz：str or tzinfo object 【要转换到哪个时区】

In [36]: series.tz_localize('Asia/Shanghai').tz_convert('UTC')                                  
Out[36]: 
1999-12-31 16:00:00+00:00    0
1999-12-31 16:01:00+00:00    1
1999-12-31 16:02:00+00:00    2
1999-12-31 16:03:00+00:00    3
1999-12-31 16:04:00+00:00    4
1999-12-31 16:05:00+00:00    5
1999-12-31 16:06:00+00:00    6
1999-12-31 16:07:00+00:00    7
1999-12-31 16:08:00+00:00    8
Freq: T, dtype: int64

8. `Series.tz_localize()`

Series.tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')

时区定位，本地化，可以为没有时区的时间序列赋予时区。

一旦时间序列被本地化到某个特定时区，就可以用tz_convert将其转换到别的时区了。

常用参数介绍：

tz：str or tzinfo object 【本地化到哪个时区】

In [41]: s = pd.Series([1], index=pd.DatetimeIndex(['2018-09-15 01:30:00']))                    
In [42]: s                                                                                      
Out[42]: 
2018-09-15 01:30:00    1
dtype: int64

In [43]: s.tz_localize('CET')                                                                   
Out[43]: 
2018-09-15 01:30:00+02:00    1
dtype: int64

9. `Series.at_time()`

Series.at_time(time, asof=False, axis=None)

选择一天中特定时间（例如，上午9:30）的值。

常用参数介绍：

time：datetime.time or str 【具体时间】

In [44]: i = pd.date_range('2018-04-09', periods=4, freq='12H') 
    ...: ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i) 
    ...: ts                                                                                     
Out[44]: 
                     A
2018-04-09 00:00:00  1
2018-04-09 12:00:00  2
2018-04-10 00:00:00  3
2018-04-10 12:00:00  4

# 选出12:00的数据
In [45]: ts.at_time('12:00')                                                                    
Out[45]: 
                     A
2018-04-09 12:00:00  2
2018-04-10 12:00:00  4

10. `Series.between_time()`

Series.between_time(start_time, end_time, include_start=True, include_end=True, axis=None)

选择一天中特定时间（例如9：00-9：30 AM）之间的值，通过将start_time时间设置为比end_time晚，您可以得到不在两数之间的时间。

常用参数介绍：

tart_time：datetime.time or str 【开始时间】
end_time：datetime.time or str 【结束时间】
include_start：bool, default True 【是否包含开始时间】
include_end：bool, default True 【是否包含结束时间】

In [46]: i = pd.date_range('2018-04-09', periods=4, freq='1D20min') 
    ...: ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i) 
    ...: ts                                                                                     
Out[46]: 
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4

# 获取0:15到0:45之间的数据
In [47]: ts.between_time('0:15', '0:45')                                                        
Out[47]: 
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

# 获取0:15之前 和 0:45之后的数据
In [48]: ts.between_time('0:45', '0:15')                                                        
Out[48]: 
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4

11. `Series.slice_shift()`

Series.slice_shift(periods=1, axis=0)

和shift()类似，但是shift不删除已被移位的数据，而slice_shift不包含(删除)移位的数据。

In [49]: df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45], 
    ...:                    "Col2": [13, 23, 18, 33, 48], 
    ...:                    "Col3": [17, 27, 22, 37, 52]}, 
    ...:                   index=pd.date_range("2020-01-01", "2020-01-05")) 
    ...: df                                                                                     
Out[49]: 
            Col1  Col2  Col3
2020-01-01    10    13    17
2020-01-02    20    23    27
2020-01-03    15    18    22
2020-01-04    30    33    37
2020-01-05    45    48    52

In [50]: df.shift(1)                                                                            
Out[50]: 
            Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02  10.0  13.0  17.0
2020-01-03  20.0  23.0  27.0
2020-01-04  15.0  18.0  22.0
2020-01-05  30.0  33.0  37.0

# 可以看到 slice_shift 将第一行删除了
In [51]: df.slice_shift(1)                                                                      
Out[51]: 
            Col1  Col2  Col3
2020-01-02    10    13    17
2020-01-03    20    23    27
2020-01-04    15    18    22
2020-01-05    30    33    37

Series第九讲时间相关的Series

Series第九讲时间相关的Series

时间相关

详细介绍

1. `Series.asfreq()`

常用参数介绍：

2. `Series.asof()`

常用参数介绍：

3. `Series.shift()`

常用参数介绍：

4. `Series.first_valid_index()`

5. `Series.last_valid_index()`

6. `Series.resample()`

常用参数介绍：

7. `Series.tz_convert()`

常用参数介绍：

8. `Series.tz_localize()`

常用参数介绍：

9. `Series.at_time()`

常用参数介绍：

10. `Series.between_time()`

常用参数介绍：

11. `Series.slice_shift()`

猜你喜欢

热点阅读

Series第九讲 时间相关的Series

Series第九讲 时间相关的Series

时间相关

详细介绍

1. Series.asfreq()

常用参数介绍：

2. Series.asof()

常用参数介绍：

3. Series.shift()

常用参数介绍：

4. Series.first_valid_index()

5. Series.last_valid_index()

6. Series.resample()

常用参数介绍：

7. Series.tz_convert()

常用参数介绍：

8. Series.tz_localize()

常用参数介绍：

9. Series.at_time()

常用参数介绍：

10. Series.between_time()

常用参数介绍：

11. Series.slice_shift()

猜你喜欢

热点阅读

Series第九讲时间相关的Series

Series第九讲时间相关的Series

1. `Series.asfreq()`

2. `Series.asof()`

3. `Series.shift()`

4. `Series.first_valid_index()`

5. `Series.last_valid_index()`

6. `Series.resample()`

7. `Series.tz_convert()`

8. `Series.tz_localize()`

9. `Series.at_time()`

10. `Series.between_time()`

11. `Series.slice_shift()`