Pandas 数据去重

2020-02-08  本文已影响0人  Noza_ea8f

image.png

Code

import pandas as pd

df = pd.read_excel(io='exls/drop_duplicates.xlsx')
# 判断重复数据
df_dupe = df.duplicated(subset='NAME')
print(df_dupe)
# 判断是否存在重复数据,存在返回True
print(df_dupe.any())
# 只保留重复数据
# df_dupe = df_dupe[df_dupe == True]  # 真值可以不指定
df_dupe = df_dupe[df_dupe]
print(df_dupe)
# 打印重复数据,通过df_dupe.index获取重复数据的索引
print(df.iloc[df_dupe.index])
# 去重,默认keep='first'
df.drop_duplicates(subset='NAME', inplace=True, keep='last')

print(df)

Output

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
8     True
9     True
dtype: bool
True
7    True
8    True
9    True
dtype: bool
   ID NAME  SCORES
7   8    A      44
8   9    B      64
9  10    C      62
   ID NAME  SCORES
3   4    D      35
4   5    E      83
5   6    F      54
6   7    G      63
7   8    A      44
8   9    B      64
9  10    C      62

Process finished with exit code 0
上一篇下一篇

猜你喜欢

热点阅读