pythonDATA ANALYSIS PROCESS

整洁数据:汇总多列(melt)&拆分列(pivot)

2018-10-09  本文已影响6人  IntoTheVoid

识别整洁的数据
要使数据整洁,它必须具有:

image.png

例如上图中的df2就不是整洁的数据,因为多个变量都挤在了同一列中.

Melt data

DataFrame using pd.melt(). There are two parameters you should be aware of: id_vars and value_vars. The id_vars represent the columns of the data you do not want to melt (i.e., keep it in its current shape), while the value_vars represent the columns you do wish to melt into rows. By default, if no value_vars are provided, all columns not set in the id_vars will be melted.

假如想让df1变为df2的样子, 代码如下:

# Print the head of airquality
print(airquality.head())

# Melt airquality: airquality_melt
airquality_melt = pd.melt(frame=airquality, id_vars=['Month', 'Day'])

# Print the head of airquality_melt
print(airquality_melt.head())

可以通过如下方式给融合的列自定义列名

# Print the head of airquality
print(airquality.head())

# Melt airquality: airquality_melt
airquality_melt = pd.melt(frame=airquality, id_vars=['Month', 'Day'], var_name='measurement', value_name='reading')

# Print the head of airquality_melt
print(airquality_melt.head())
image.png

Pivot data

pivot data 是melt data 的逆过程


image.png

示例:

# Print the head of airquality_melt
print(airquality_melt.head())

# Pivot airquality_melt: airquality_pivot
airquality_pivot = airquality_melt.pivot_table(index=['Month', 'Day'], columns='measurement', values='reading')

# Print the head of airquality_pivot
print(airquality_pivot.head())
image.png

恢复索引

# Print the index of airquality_pivot
print(airquality_pivot.index)

# Reset the index of airquality_pivot: airquality_pivot_reset
airquality_pivot_reset = airquality_pivot.reset_index()

# Print the new index of airquality_pivot_reset
print(airquality_pivot_reset.index)

# Print the head of airquality_pivot_reset
print(airquality_pivot_reset.head())
image.png
Pivoting duplicate values

pivot_table中传入aggfunc=

Splitting a column with .split() and .get()

对于如下的结构


image.png

如何将Cases_xxxxDeath_xxxx拆成两列一列为case, 一列为xxxx

# Melt ebola: ebola_melt
ebola_melt = pd.melt(ebola, id_vars=['Date', 'Day'], var_name='type_country', value_name='counts')

# Create the 'str_split' column
ebola_melt['str_split'] = ebola_melt.type_country.str.split('_')

# Create the 'type' column
ebola_melt['type'] = ebola_melt.str_split.str[0]

# Create the 'country' column
ebola_melt['country'] = ebola_melt.str_split.str[1]

# Print the head of ebola_melt
print(ebola_melt.head())

上一篇 下一篇

猜你喜欢

热点阅读