Python

python pandas对DataFrame缺失值处理-pd.

2019-09-25  本文已影响0人  悟空Oo

pandas.DataFrame.dropna

DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
参考pandas.DataFrame.dropna

  • axis : {0 or ‘index’, 1 or ‘columns’}, default 0
    0, or ‘index’ : 以行为单位进行计算,若该行中具有缺失值则舍去该行,
    1, or ‘columns’ : 以列为单位进行计算,若该列中具有缺失值则舍去该列;
  • how :{‘any’, ‘all’}, default ‘any
    ‘any’ : 只要含有NA,就舍去该行/列,
    ‘all’ : 只有该行/列均为NA时才舍去;
  • thresh : int, optional,指定行/列具有非NA的数目,即至少具有thresh个非NA值时才进行保留;
  • subset:array-like, optional,对特定的列进行缺失值删除处理;
  • inplace : bool, default False,如果是True, 修改原dataframe,返回值为None.

代码演示:

>>> import numpy as np
>>> import pandas as pd
>>> data = np.eye(6)
>>> datanan = np.where(data,data,np.nan)
>>> datapdnan = pd.DataFrame(datanan)
>>> datapd = datapdnan.fillna(method='ffill')
>>> datapd#这几步生成一个用于测试的dataframe:datapd
     0    1    2    3    4    5
0  1.0  NaN  NaN  NaN  NaN  NaN
1  1.0  1.0  NaN  NaN  NaN  NaN
2  1.0  1.0  1.0  NaN  NaN  NaN
3  1.0  1.0  1.0  1.0  NaN  NaN
4  1.0  1.0  1.0  1.0  1.0  NaN
5  1.0  1.0  1.0  1.0  1.0  1.0
>>> datapd.dropna()#按行删除:存在空值,即删除该行
     0    1    2    3    4    5
5  1.0  1.0  1.0  1.0  1.0  1.0
>>> datapd.dropna(how='all')#按行删除:所有数据都为空值时,即删除该行
     0    1    2    3    4    5
0  1.0  NaN  NaN  NaN  NaN  NaN
1  1.0  1.0  NaN  NaN  NaN  NaN
2  1.0  1.0  1.0  NaN  NaN  NaN
3  1.0  1.0  1.0  1.0  NaN  NaN
4  1.0  1.0  1.0  1.0  1.0  NaN
5  1.0  1.0  1.0  1.0  1.0  1.0
>>> datapd.dropna(axis='columns', thresh=3)#按列删除:保留至少有3个非NaN值的列
     0    1    2    3
0  1.0  NaN  NaN  NaN
1  1.0  1.0  NaN  NaN
2  1.0  1.0  1.0  NaN
3  1.0  1.0  1.0  1.0
4  1.0  1.0  1.0  1.0
5  1.0  1.0  1.0  1.0
>>> datapd.dropna(axis='index', subset=[1,2])#设置子集:删除第1、2列有空值的行
     0    1    2    3    4    5
2  1.0  1.0  1.0  NaN  NaN  NaN
3  1.0  1.0  1.0  1.0  NaN  NaN
4  1.0  1.0  1.0  1.0  1.0  NaN
5  1.0  1.0  1.0  1.0  1.0  1.0
>>> datapd.dropna(axis=1, how='any', subset=[2,3])#设置子集:删除第2、3行有空值的列
     0    1    2
0  1.0  NaN  NaN
1  1.0  1.0  NaN
2  1.0  1.0  1.0
3  1.0  1.0  1.0
4  1.0  1.0  1.0
5  1.0  1.0  1.0
>>> print(datapd.dropna(inplace=True))#原地修改原dataframe,返回值为None.
None
>>> datapd
     0    1    2    3    4    5
5  1.0  1.0  1.0  1.0  1.0  1.0
上一篇下一篇

猜你喜欢

热点阅读