pandans_query函数

2023-01-30  本文已影响0人  敬子v

数据源:链接: https://pan.baidu.com/s/1EFqJFXf70t2Rubkh6D19aw 提取码: syqg
数据源示例:

探索风速数据

步骤1 导入必要的库

import pandas as pd
import numpy as np
import datetime

步骤2 从以下地址导入数据

path1='pandas_exercise\exercise_data\wind.data'

步骤3 将数据作存储并且设置前三列为合适的索引

data=pd.read_csv(path1,sep='\s+',parse_dates=[[0,1,2]])
print(data.head())

步骤4 2061年?我们真的有这一年的数据?创建一个函数并用它去修复这个bug

def fix_century(x):
year=x.year-100 if x.year>1989 else x.year
return datetime.date(year,x.month,x.day)

步骤5 将日期设为索引,注意数据类型,应该是datetime64[ns]

data['Yr_Mo_Dy'] = data['Yr_Mo_Dy'].apply(fix_century)
print(data.head())
data=data.set_index('Yr_Mo_Dy')
print(data.head())

步骤6 对应每一个location,一共有多少数据值缺失

print(data.isnull().sum())

步骤7 对应每一个location,一共有多少完整的数据值

print(data.shape[0]-data.isnull().sum())

步骤8 对于全体数据,计算风速的平均值

print(data.mean().mean())

步骤9 创建一个名为loc_stats的数据框去计算并存储每个location的风速最小值,最大值,平均值和标准差

loc_stats=pd.DataFrame()
loc_stats['min']=data.min()
loc_stats['max']=data.max()
loc_stats['mean']=data.mean()
loc_stats['std']=data.std()
print(loc_stats)

步骤10 创建一个名为day_stats的数据框去计算并存储所有location的风速最小值,最大值,平均值和标准差

day_stats=pd.DataFrame()
day_stats['min']=data.min(axis=1)
day_stats['max']=data.max(axis=1)
day_stats['mean']=data.mean(axis=1)
day_stats['std']=data.std(axis=1)
print(day_stats)

步骤11 对于每一个location,计算一月份的平均风速 注意,1961年的1月和1962年的1月应该区别对待

data['date']=data.index
data['month']=data['date'].apply(lambda x:x.month)
data['year']=data['date'].apply(lambda x:x.year)
data['day']=data['date'].apply(lambda x:x.day)
jan_wind=data.query('month==1')
print(jan_wind.mean())

步骤12 对于数据记录按照年为频率取样

print(jan_wind.query('month==1 and day==1'))

步骤13 对于数据记录按照月为频率取样

print(jan_wind.query('day==1'))

输出

# 步骤3
              RPT    VAL    ROS    KIL  ...    MUL    CLO    BEL    MAL
Yr_Mo_Dy                                ...                            
1961-01-01  15.04  14.96  13.17   9.29  ...  10.83  12.58  18.50  15.04
1961-01-02  14.71    NaN  10.83   6.50  ...   9.79   9.67  17.54  13.83
1961-01-03  18.50  16.88  12.33  10.13  ...   8.50   7.67  12.75  12.71
1961-01-04  10.58   6.63  11.75   4.58  ...   5.83   5.88   5.46  10.88
1961-01-05  13.33  13.25  11.42   6.17  ...  10.92  10.34  12.92  11.83
[5 rows x 12 columns]
# 步骤5
 Yr_Mo_Dy    RPT    VAL    ROS    KIL  ...    CLA    MUL    CLO    BEL    MAL
0  1961-01-01  15.04  14.96  13.17   9.29  ...  10.25  10.83  12.58  18.50  15.04
1  1961-01-02  14.71    NaN  10.83   6.50  ...  10.04   9.79   9.67  17.54  13.83
2  1961-01-03  18.50  16.88  12.33  10.13  ...    NaN   8.50   7.67  12.75  12.71
3  1961-01-04  10.58   6.63  11.75   4.58  ...   1.79   5.83   5.88   5.46  10.88
4  1961-01-05  13.33  13.25  11.42   6.17  ...   6.54  10.92  10.34  12.92  11.83
[5 rows x 13 columns]
              RPT    VAL    ROS    KIL  ...    MUL    CLO    BEL    MAL
Yr_Mo_Dy                                ...                            
1961-01-01  15.04  14.96  13.17   9.29  ...  10.83  12.58  18.50  15.04
1961-01-02  14.71    NaN  10.83   6.50  ...   9.79   9.67  17.54  13.83
1961-01-03  18.50  16.88  12.33  10.13  ...   8.50   7.67  12.75  12.71
1961-01-04  10.58   6.63  11.75   4.58  ...   5.83   5.88   5.46  10.88
1961-01-05  13.33  13.25  11.42   6.17  ...  10.92  10.34  12.92  11.83
[5 rows x 12 columns]
# 步骤6
RPT    6
VAL    3
ROS    2
KIL    5
SHA    2
BIR    0
DUB    3
CLA    2
MUL    3
CLO    1
BEL    0
MAL    4
dtype: int64
# 步骤7
RPT    6568
VAL    6571
ROS    6572
KIL    6569
SHA    6572
BIR    6574
DUB    6571
CLA    6572
MUL    6571
CLO    6573
BEL    6574
MAL    6570
dtype: int64
# 步骤8
10.227982360836938
# 步骤9
      min    max       mean       std
RPT  0.67  35.80  12.362987  5.618413
VAL  0.21  33.37  10.644314  5.267356
ROS  1.50  33.84  11.660526  5.008450
KIL  0.00  28.46   6.306468  3.605811
SHA  0.13  37.54  10.455834  4.936125
BIR  0.00  26.16   7.092254  3.968683
DUB  0.00  30.37   9.797343  4.977555
CLA  0.00  31.08   8.495053  4.499449
MUL  0.00  25.88   8.493590  4.166872
CLO  0.04  28.21   8.707332  4.503954
BEL  0.13  42.38  13.121007  5.835037
MAL  0.67  42.54  15.599079  6.699794
# 步骤10
             min    max       mean       std
Yr_Mo_Dy                                    
1961-01-01  9.29  18.50  13.018182  2.808875
1961-01-02  6.50  17.54  11.336364  3.188994
1961-01-03  6.17  18.50  11.641818  3.681912
1961-01-04  1.79  11.75   6.619167  3.198126
1961-01-05  6.17  13.33  10.630000  2.445356
          ...    ...        ...       ...
1978-12-27  8.08  40.08  16.708333  7.868076
1978-12-28  5.00  41.46  15.150000  9.687857
1978-12-29  8.71  29.58  14.890000  5.756836
1978-12-30  9.13  28.79  15.367500  5.540437
1978-12-31  9.59  27.29  15.402500  5.702483
[6574 rows x 4 columns]
# 步骤11
  print(jan_wind.mean())
RPT        14.847325
VAL        12.914560
ROS        13.299624
KIL         7.199498
SHA        11.667734
BIR         8.054839
DUB        11.819355
CLA         9.512047
MUL         9.543208
CLO        10.053566
BEL        14.550520
MAL        18.028763
month       1.000000
year     1969.500000
day        16.000000
dtype: float64
# 步骤12
              RPT    VAL    ROS    KIL  ...        date  month  year  day
Yr_Mo_Dy                                ...                              
1961-01-01  15.04  14.96  13.17   9.29  ...  1961-01-01      1  1961    1
1962-01-01   9.29   3.42  11.54   3.50  ...  1962-01-01      1  1962    1
1963-01-01  15.59  13.62  19.79   8.38  ...  1963-01-01      1  1963    1
1964-01-01  25.80  22.13  18.21  13.25  ...  1964-01-01      1  1964    1
1965-01-01   9.54  11.92   9.00   4.38  ...  1965-01-01      1  1965    1
1966-01-01  22.04  21.50  17.08  12.75  ...  1966-01-01      1  1966    1
1967-01-01   6.46   4.46   6.50   3.21  ...  1967-01-01      1  1967    1
1968-01-01  30.04  17.88  16.25  16.25  ...  1968-01-01      1  1968    1
1969-01-01   6.13   1.63   5.41   1.08  ...  1969-01-01      1  1969    1
1970-01-01   9.59   2.96  11.79   3.42  ...  1970-01-01      1  1970    1
1971-01-01   3.71   0.79   4.71   0.17  ...  1971-01-01      1  1971    1
1972-01-01   9.29   3.63  14.54   4.25  ...  1972-01-01      1  1972    1
1973-01-01  16.50  15.92  14.62   7.41  ...  1973-01-01      1  1973    1
1974-01-01  23.21  16.54  16.08   9.75  ...  1974-01-01      1  1974    1
1975-01-01  14.04  13.54  11.29   5.46  ...  1975-01-01      1  1975    1
1976-01-01  18.34  17.67  14.83   8.00  ...  1976-01-01      1  1976    1
1977-01-01  20.04  11.92  20.25   9.13  ...  1977-01-01      1  1977    1
1978-01-01   8.33   7.12   7.71   3.54  ...  1978-01-01      1  1978    1
[18 rows x 16 columns]
# 步骤13
              RPT    VAL    ROS    KIL  ...        date  month  year  day
Yr_Mo_Dy                                ...                              
1961-01-01  15.04  14.96  13.17   9.29  ...  1961-01-01      1  1961    1
1962-01-01   9.29   3.42  11.54   3.50  ...  1962-01-01      1  1962    1
1963-01-01  15.59  13.62  19.79   8.38  ...  1963-01-01      1  1963    1
1964-01-01  25.80  22.13  18.21  13.25  ...  1964-01-01      1  1964    1
1965-01-01   9.54  11.92   9.00   4.38  ...  1965-01-01      1  1965    1
1966-01-01  22.04  21.50  17.08  12.75  ...  1966-01-01      1  1966    1
1967-01-01   6.46   4.46   6.50   3.21  ...  1967-01-01      1  1967    1
1968-01-01  30.04  17.88  16.25  16.25  ...  1968-01-01      1  1968    1
1969-01-01   6.13   1.63   5.41   1.08  ...  1969-01-01      1  1969    1
1970-01-01   9.59   2.96  11.79   3.42  ...  1970-01-01      1  1970    1
1971-01-01   3.71   0.79   4.71   0.17  ...  1971-01-01      1  1971    1
1972-01-01   9.29   3.63  14.54   4.25  ...  1972-01-01      1  1972    1
1973-01-01  16.50  15.92  14.62   7.41  ...  1973-01-01      1  1973    1
1974-01-01  23.21  16.54  16.08   9.75  ...  1974-01-01      1  1974    1
1975-01-01  14.04  13.54  11.29   5.46  ...  1975-01-01      1  1975    1
1976-01-01  18.34  17.67  14.83   8.00  ...  1976-01-01      1  1976    1
1977-01-01  20.04  11.92  20.25   9.13  ...  1977-01-01      1  1977    1
1978-01-01   8.33   7.12   7.71   3.54  ...  1978-01-01      1  1978    1
[18 rows x 16 columns]
上一篇 下一篇

猜你喜欢

热点阅读