Pandas 數據結構介紹

2020-09-01 本文已影响0人浅语__

Pandas 兩個主要數據結構： Series 和 DataFrame

類似于一維數組對象，由一組數據及一組與之相關的數據標籤（索引）組成。
與普通Numpy數組相比，可以通過索引方式選取Series中的單個或一組值。
也可以將Series看做一個有序字典。
可以通過一個被存放Python字典的數據來創建Series

s=Series(data, index=data_index)

缺失數據的篩選

s.isnull()

Series 在算數運算中會自動對齊索引相同的數據進行運算

表格型數據結構，既有行索引，又有列索引。可以被看做由Series組成的字典。

frame2=frame.reindex(['a','b','c'])
frame2=frame.reindex(columns=data)

2.drop
默認axis=0，因此刪除列值時，需標註axis=1.
關於axis的理解，參考 https://www.jianshu.com/p/bf60078103f2

image.png

3.loc/iloc
loc: 名稱索引行/列標籤
iloc: 索引數值為int
參考 https://zhuanlan.zhihu.com/p/111123163?from_voters_page=true

4.Series DataFrame 算法
df.add(df1,fill_value=0) 填充不匹配部分數據
df.sub 減法
df.div 除法
df.mul 乘法