Pandas 数据去重
2020-02-08 本文已影响0人
Noza_ea8f
表
image.pngCode
import pandas as pd
df = pd.read_excel(io='exls/drop_duplicates.xlsx')
# 判断重复数据
df_dupe = df.duplicated(subset='NAME')
print(df_dupe)
# 判断是否存在重复数据,存在返回True
print(df_dupe.any())
# 只保留重复数据
# df_dupe = df_dupe[df_dupe == True] # 真值可以不指定
df_dupe = df_dupe[df_dupe]
print(df_dupe)
# 打印重复数据,通过df_dupe.index获取重复数据的索引
print(df.iloc[df_dupe.index])
# 去重,默认keep='first'
df.drop_duplicates(subset='NAME', inplace=True, keep='last')
print(df)
Output
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 True
8 True
9 True
dtype: bool
True
7 True
8 True
9 True
dtype: bool
ID NAME SCORES
7 8 A 44
8 9 B 64
9 10 C 62
ID NAME SCORES
3 4 D 35
4 5 E 83
5 6 F 54
6 7 G 63
7 8 A 44
8 9 B 64
9 10 C 62
Process finished with exit code 0