pandas练习

2019-02-22 本文已影响0人小T数据站

练习地址

以下是伪目录：

1、读取以及了解数据

2、筛选以及排序

3、分组

4、Apply

5、merge

6、stats统计数据

7、可视化

8、时间序列

由于每一个点都有多套习题，以下就只是汇总一下所用到的方法：

1、读取以及了解数据

# 文件读取
df = pd.read_csv(data_path,sep=)
#前几行
df.head()/df.head(n)
#尾几行
df.tail()/df.tail(n)
#总览
df.info()
df.describe()
#行数
df.shape[0]
#列数
df.shape[1]
#字段类型
df.dtype()
#索引
df.index
#列名
df.columns
#求取某一列唯一值的个数
df.column_name.nunique()
#数据的选取
df[行值][列值]
df.loc[]
df.iloc[]
#分类计数
df.column_name.values_counts()

2、筛选以及排序

#选取数据
df[['col_name1','col_name2'...]]
df[logical_test]
df.loc[logical_test]
df.query(logical_test)
df.loc[['row_name1'],['col_name1','col_name2']]
df.iloc[n1:n2,n3:n4]
#排序
df.sort_valus(by=)
#将数据框的一列值设置为索引
df.set_index('col_name')
#others
df[df.col_name.str.startswith('G')]

3、分组

split-apply-combine

df.groupby('col_name1').func.col_name2
df.groupby('col_name1').col_name2.agg(['func1','func2','func3'...])
df.groupby('col_name1').func.col_name2.stack()/unstack()
# 对分组进行迭代
for name, group in df.groupby('col_name'):
    # print the name of the col_name
    print(name)
    # print the data of that col_name
    print(group)

4、Apply

df.column_name.apply(func)
pd.to_datetime(time_col,format='%Y-%m')
df.resample().func

5、merge

#上下联结
df1.append(df2,ignore_index=True)
pd.concat([df1,df2],ignore_index=True)
#左右联结
df1.merge(df2,how=,on=)
pd.concat([df1,,df2],axis=1)

6、stats统计数据

#读取数据
data = pd.read_table(url,sep='\s+', parse_dates=[[0,1,2]])
#统计缺失值数量
data.isnull().sum()
#统计非缺失值数量
data.notnull(0.sum()
#统计数据框中所有数据的统计值，比如均值
data.fillna(0).values.flatten().mean()
#日期是索引，按照年统计数据
data.groupby(data.index.to_period('A')).mean()
#日期是索引，按照月统计数据
data.groupby(data.index.to_period('M')).mean()
#日期是索引，按照周统计数据
data.groupby(data.index.to_period('W')).mean()
#others
weekly = data.resample('W').agg(['min','max','mean','std'])

7、可视化

案列1：https://github.com/guipsamora/pandas_exercises/blob/master/07_Visualization/Tips/Exercises_with_code_and_solutions.ipynb
案例2：
https://github.com/guipsamora/pandas_exercises/blob/master/07_Visualization/Titanic_Desaster/Exercises_code_with_solutions.ipynb

8、时间序列

pd.to_datetime()
df = df.set_index('time_serises')
df.index.year
df.index.week
df.index.resample()

pandas练习

以下是伪目录：

由于每一个点都有多套习题，以下就只是汇总一下所用到的方法：

1、读取以及了解数据

2、筛选以及排序

3、分组

4、Apply

5、merge

6、stats统计数据

7、可视化

8、时间序列

猜你喜欢

热点阅读