我爱编程

pandas 3

2017-12-29  本文已影响0人  钊钖

pandas 3

Making Pivot Tables

Pivot tables provide an easy way to subset by one column and then apply a calculation like a sum or a mean.

Pivot tables first group and then apply a calculation. In the previous screen, we actually made a pivot table manually by grouping by the column "pclass" and then calculating the mean of the "fare" column for each class.

Luckily, we can use the Dataframe.pivot_table() method instead, which simplifies the kind of work we did on the last screen. To produce the same data, we could use one line.

passenger_class_fares =titanic_survival.pivot_table(index="pclass", values="fare", aggfunc=np.mean)

The first parameter of the method, index tells the method which column to group by.

The second parameter values is the column that we want to apply the calculation to, and aggfunc specifies the calculation we want to perform.

The default for the aggfunc parameter is actually the mean, so if we're calculating this we can omit this parameter.

Instructions

import numpy as np

passenger_survival =titanic_survival.pivot_table(index="pclass", values="survived")

passenger_age =titanic_survival.pivot_table(index="pclass", values="age")

print(passenger_age)

If we pass a list of column names to the values parameter instead of a single value, we can perform calculations on multiple columns at once.

We can also specify a custom calculation to be made. For instance, if we pass np.sum to the aggfunc parameter it will total the values in each column.

Instructions

import numpy as np

port_stats =titanic_survival.pivot_table(index = 'embarked',values = ['fare',"survived"],aggfunc= numpy.sum)
  
print(port_stats)



Drop Missing Values

We can use the DataFrame.dropna() method on pandas DataFrames to do this. The method will drop any rows that contain missing values.

The dropna() method takes an axis parameter, which indicates whether you would like to drop rows or columns.

Specifying axis=0 or axis='index' will drop any rows that have null values, while specifying axis=1 or axis='columns' will drop any columns that have null values.

Instructions

Drop all columns in titanic_survival that have missing values and assign the result to drop_na_columns.
Drop all rows in titanic_survival where the columns "age" or "sex" have missing values and assign the result to new_titanic_survival.

drop_na_columns =titanic_survival.dropna(axis = 1)

new_titanic_survival = titanic_survival.dropna(axis =0,subset=['sex','age'])
上一篇 下一篇

猜你喜欢

热点阅读