Pandas快速入门(二)
2019-10-08 本文已影响0人
乔治大叔
继Pandas快速入门(一),接着写:
布尔索引
print(df[df.A>0]) #取值df.A>0的所有
print(df[df>0]) #显示大于0的值,else显示NaN
A B C D
2019-09-01 0.586356 1.969502 1.125890 -0.831724
2019-09-04 0.886695 1.543536 -0.170274 0.867814
2019-09-06 0.297143 -0.317093 1.125189 1.023567
A B C D
2019-09-01 0.586356 1.969502 1.125890 NaN
2019-09-02 NaN NaN NaN NaN
2019-09-03 NaN 1.162615 0.699749 1.224788
2019-09-04 0.886695 1.543536 NaN 0.867814
2019-09-05 NaN 0.445182 NaN NaN
2019-09-06 0.297143 NaN 1.125189 1.023567
过滤
使用 isin() 方法过滤:
df['E'] = ['one','two','three','four','five','six']
print(df)
print(df[df['E'].isin(['two','three'])])
A B C D E
2019-09-01 0.586356 1.969502 1.125890 -0.831724 one
2019-09-02 -0.665937 -0.897839 -1.208598 -1.226119 two
2019-09-03 -2.418687 1.162615 0.699749 1.224788 three
2019-09-04 0.886695 1.543536 -0.170274 0.867814 four
2019-09-05 -0.671953 0.445182 -0.614136 -0.064305 five
2019-09-06 0.297143 -0.317093 1.125189 1.023567 six
A B C D E
2019-09-02 -0.665937 -0.897839 -1.208598 -1.226119 two
2019-09-03 -2.418687 1.162615 0.699749 1.224788 three
赋值
虽然用于选择和赋值的标准Python / Numpy表达式非常直观,并且便于交互工作,但是对于生产环境的代码,我们推荐优化的Pandas数据访问方法.at、.iat、.loc和.iloc。
添加新列将自动根据索引对齐数据:
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20191001', periods=6))
print(s1)
2019-10-01 1
2019-10-02 2
2019-10-03 3
2019-10-04 4
2019-10-05 5
2019-10-06 6
Freq: D, dtype: int64
通过标签赋值:
datas = pd.date_range('20190901',periods=6)
print(datas)
df.at[dates[0], 'C'] = 0 #dates[0]='2019-09-01'
DatetimeIndex(['2019-09-01', '2019-09-02', '2019-09-03', '2019-09-04',
'2019-09-05', '2019-09-06'],
dtype='datetime64[ns]', freq='D')
A B C D
2019-09-01 -0.397827 -1.102112 0.000000 0.161291
2019-09-02 -0.751784 -0.759627 -1.311447 -0.919117
2019-09-03 0.531277 0.550232 -1.253598 0.647749
2019-09-04 -0.549671 1.000032 -0.927265 0.094845
2019-09-05 -0.046609 0.399075 1.111344 1.722658
2019-09-06 -1.424410 -1.328193 2.587026 0.463605
通过位置赋值:
df.iat[0,2] = 0 #第0行,第2列
A B C D
2019-09-01 -0.921584 -0.207005 0.000000 -0.548157
2019-09-02 -0.899229 0.561346 0.574105 -1.558532
2019-09-03 -1.277597 -0.583355 1.247190 -0.916555
2019-09-04 -1.227783 0.522624 -2.151186 -0.281190
2019-09-05 0.553149 -0.114055 0.616718 0.875897
2019-09-06 1.140854 -0.052508 0.943119 1.269147
使用NumPy数组赋值:
df.loc[:,'D'] = np.array([5]*len(df)) #通过NumPy赋值,[]不能省
A B C D
2019-09-01 0.260309 -0.786362 0.900311 5
2019-09-02 -1.035287 1.727411 -0.041896 5
2019-09-03 -0.495706 0.687953 -0.121707 5
2019-09-04 -0.365145 -0.844624 -0.764868 5
2019-09-05 0.309504 0.465509 -0.363573 5
2019-09-06 -0.143167 -0.405704 -1.102475 5
带有where
条件的赋值操作:
df2 = df.copy()
df2[df2<0] = -df2 #如果小于零,则为正数
print(df2)
A B C D
2019-09-01 0.608456 1.503148 -0.194184 0.149963
2019-09-02 -0.654379 1.039558 -0.321524 1.771350
2019-09-03 -2.084704 -0.734897 0.260852 -1.163411
2019-09-04 -0.461798 0.311986 1.860293 -1.353793
2019-09-05 0.660783 -2.050908 -0.480054 -1.123917
2019-09-06 0.070030 -0.405595 0.687804 0.119593
A B C D
2019-09-01 0.608456 1.503148 0.194184 0.149963
2019-09-02 0.654379 1.039558 0.321524 1.771350
2019-09-03 2.084704 0.734897 0.260852 1.163411
2019-09-04 0.461798 0.311986 1.860293 1.353793
2019-09-05 0.660783 2.050908 0.480054 1.123917
2019-09-06 0.070030 0.405595 0.687804 0.119593
缺失值
Pandas主要使用值np.nan来表示缺失的数据。
df2 = df2[df2>0] #显示大于0的值,else显示NaN
print(df2)
print(df2.dropna(how='any')) #删除任何带有缺失值的行
print(df2.fillna(value=5)) #填充缺失值
print(pd.isna(df2)) #获取值为nan的掩码,nan为true
A B C D
2019-09-01 2.504590 NaN 1.139982 NaN
2019-09-02 NaN 0.604752 0.655428 NaN
2019-09-03 NaN NaN 1.086983 0.600510
2019-09-04 NaN NaN NaN 0.459104
2019-09-05 NaN NaN NaN 1.349749
2019-09-06 0.803654 1.542528 0.041647 1.053980
A B C D
2019-09-06 0.803654 1.542528 0.041647 1.05398
A B C D
2019-09-01 2.504590 5.000000 1.139982 5.000000
2019-09-02 5.000000 0.604752 0.655428 5.000000
2019-09-03 5.000000 5.000000 1.086983 0.600510
2019-09-04 5.000000 5.000000 5.000000 0.459104
2019-09-05 5.000000 5.000000 5.000000 1.349749
2019-09-06 0.803654 1.542528 0.041647 1.053980
A B C D
2019-09-01 False True False True
2019-09-02 True False False True
2019-09-03 True True False False
2019-09-04 True True True False
2019-09-05 True True True False
2019-09-06 False False False False