Pandas 数据排序计数分组合并

2019-07-22  本文已影响0人  Bounty_Hunter

Week 4

数据排序

数据计数

series.nunique()

数据分组

dataframe.groupby()可以用自然的方式将数据分成有共同点的几组,完成类似于“有几个小孩的人收入的平均数情况”等数据分析

自定义的聚合函数,通过传入 GroupBy.aggregate()GroupBy.agg() 来实现

dataframe.groupby('列名').agg(func_name)

数据合并

dataframe1.merge(dataframe2,how='outer')

可以根据一个或多个键(key)将不同DataFrame中的行连接起来

唯一的要求是,在每个数据集中,有一相同的列。

返回的是一个 dataframe 类型的对象

`inner`:内连接(取key的交集)

`outer`:外连接(取 key 的并集)

Challenge

Sorting

Reads in the excel file DirectMarketing.xlsx, and aselect only the Age, Salary and AmountSpent columns. It should then sort the rows so that they are in increasing order of Salary. Using head(), display the first 5 rows.

读取文件,选择特定的列,根据 Salary 的值升序排列,截取前五行
根据值进行升序排序用DataFrame.sort_values()

import pandas as pd
marketing = pd.read_excel('DirectMarketing.xlsx')
marketing = marketing[['Age','Salary','AmountSpent']]
marketing.sort_values(by = 'Salary',inplace = True)
print(marketing.head(5))

Group by: salary per number of children

Reads in the excel file DirectMarketing.xlsx and group the data by number of children, and print out the mean salary for each group.

读取文件,根据children的数据分组,打印每一组的salarymean

分组函数dataframe.groupby('列名')

按组求平均值Grouphy.mean()

Dataframe 的 某一列提取出series:dataframe['列名']

import pandas as pd 
marketing = pd.read_excel('DirectMarketing.xlsx')
child_groups = marketing.groupby('Children')
print(child_groups.mean()['Salary'])

Inner join: soccer and cricket

将两张表内连接,改变 dataframe 列的顺序

dataframe1.merge(dataframe2,how = 'inner')

import pandas as pd
soccer = pd.DataFrame({'People': ['Stephen', 'Alison', 'Jess'], 'SoccerTeam': ['Chelsea', 'Chelsea', 'Arsenal']})
cricket = pd.DataFrame({'People': ['Stephen', 'Tina', 'Jess', 'Will'], 'CricketTeam': ['Sixers', 'Sixers', 'Thunder', 'Thunder']})


sport = cricket.merge(soccer, how='inner')
print(sport[['People','SoccerTeam','CricketTeam']])

Outer join: soccer and cricket

将两张表外连接

dataframe1.merge(dataframe2,how = 'outer')

import pandas as pd
soccer = pd.DataFrame({'People': ['Stephen', 'Alison', 'Jess'], 'SoccerTeam': ['Chelsea', 'Chelsea', 'Arsenal']})
cricket = pd.DataFrame({'People': ['Stephen', 'Tina', 'Jess', 'Will'], 'CricketTeam': ['Sixers', 'Sixers', 'Thunder', 'Thunder']})

sport = soccer.merge(cricket, how = 'outer')
print(sport)

Mean marks

Add in an additional column to the combined dataset called mean_marks which gives the average mark for each person. Print the data in order of descending mean mark.

合并表格,计算每行的平均数,并在后面添加一列数据,最后根据平均数据倒序排序

import pandas as pd

BUSS6002 = pd.DataFrame({'People': ['Stephen', 'Tina', 'Alison', 'Jess'], 'BUSS6002_mark': [89, 63, 84, 70]})
QBUS6840 = pd.DataFrame({'People': ['Stephen', 'Cooper', 'Jess', 'Will', 'Chris'], 'QBUS6840_mark': [83, 71, 74, 68, 88]})
INFO6018 = pd.DataFrame({'People': ['Cooper', 'Jess', 'Alison', 'Chris'], 'INFO6018_mark': [68, 71, 86, 91]})


marks = BUSS6002.merge(QBUS6840,how = 'outer')
marks = marks.merge(INFO6018,how = 'outer')
marks['mean_mark'] = marks.mean(1)
print(marks.sort_values( by= 'mean_mark', ascending = False))
上一篇 下一篇

猜你喜欢

热点阅读