pandas

2020-04-07 本文已影响0人黑夜的眸

数据结构

dataframe
panel
series

dataframe

属性
- shape （行数，列数）
- values numpy 二维数组array
- index 行索引
- columns 列索引
- T 转置
方法
- head() 默认前五行
- tail() 默认后五行

panel： Dataframe的容器

未来有可能弃用，建议使用multiindex

series

属性
- index
- values

索引

直接索引
df['column']['row']------先列后行
名字索引
df.loc['row', 'column'] 或 df.loc['row', 'column']
数字索引
df.iloc[0, 1]
组合索引
df.ix[:4, ['column1', 'column2']]

排序

内容排序
df.sort_values(by=, ascending=)
索引排序
df.sort_index()

运算

算数运算
- add +
- sub -
- mul *
- div /
- mod //
- pow **
逻辑运算
- < > & |
- df[df['column']>2]
- df.query('column>2')
- df['column'].isin([1, 2])
统计运算
- sum
- mean
- mode
- median
- min
- max
- abs
- prod
- std
- var
- idxmax
- idxmin
自定义运算
apply(func, axis=0)

画图

dataframe.plot(x=None, y=None, kind='line')

kind:

line
bar
barth
hist
pie
scatter

文件读取

Format Type	Data Description	Reader	Writer
text	CSV	read_csv	to_csv
text	JSON	read_json	to_json
text	HTML	read_html	to_html
text	Local clipboard	read_clipboard	to_clipboard
binary	MS Excel	read_excel	to_excel
binary	OpenDocument	read_excel
binary	HDF5 Format	read_hdf	to_hdf
binary	Feather Format	read_feather	to_feather
binary	Parquet Format	read_parquet	to_parquet
binary	Msgpack	read_msgpack	to_msgpack
binary	Stata	read_stata	to_stata
binary	SAS	read_sas
binary	Python Pickle Format	read_pickle	to_pickle
SQL	SQL	read_sql	to_sql
SQL	Google Big Query	read_gbq	to_gbq

read_csv(path, usecols=[], names=[])

usecols: 只读取固定列

names: 为没有列名的数据，增加列名，否则文件第一行一般会当作列名

to_csv(path, index=False, header=False, mode='a')

index: 索引是否取消

header: 表头

mode: 'a'代表追加

pandas

数据结构

索引

排序

运算

画图

文件读取

猜你喜欢

热点阅读