Pandas 数据可视化总结
2019-05-09 本文已影响469人
旭Louis
基本图形
柱状图
reviews['points'].value_counts().sort_index().plot.bar()
散点图
reviews[reviews['price'] < 100].sample(100).plot.scatter(x='price', y='points')
![](https://img.haomeiwen.com/i1376538/5d37147549296c72.png)
蜂窝图
reviews[reviews['price'] < 100].plot.hexbin(x='price', y='points', gridsize=15)
![](https://img.haomeiwen.com/i1376538/268bfe07b483703b.png)
大量重复的点可以用这种图表示
柱状图-叠加模式
![](https://img.haomeiwen.com/i1376538/1a8a7231fa732d23.png)
wine_counts.plot.bar(stacked=True)
![](https://img.haomeiwen.com/i1376538/c4134a284913356b.png)
面积模式
wine_counts.plot.area()
折线模式
wine_counts.plot.line()
美化
设置图的大小,字体大小,颜色,标题
reviews['points'].value_counts().sort_index().plot.bar(
figsize=(12, 6),
color='mediumvioletred',
fontsize=16,
title='Rankings Given by Wine Magazine',
)
借助Matplotlib
import matplotlib.pyplot as plt
ax = reviews['points'].value_counts().sort_index().plot.bar(
figsize=(12, 6),
color='mediumvioletred',
fontsize=16
)
ax.set_title("Rankings Given by Wine Magazine", fontsize=20)
![](https://img.haomeiwen.com/i1376538/356ad1e309928d6a.png)
借助Seaborn-去除边框
import matplotlib.pyplot as plt
import seaborn as sns
ax = reviews['points'].value_counts().sort_index().plot.bar(
figsize=(12, 6),
color='mediumvioletred',
fontsize=16
)
ax.set_title("Rankings Given by Wine Magazine", fontsize=20)
sns.despine(bottom=True, left=True)
![](https://img.haomeiwen.com/i1376538/b22444e10533a486.png)
多图表
matplotlib
fig, axarr = plt.subplots(2, 2, figsize=(12, 8))
reviews['points'].value_counts().sort_index().plot.bar(
ax=axarr[0][0]
)
reviews['province'].value_counts().head(20).plot.bar(
ax=axarr[1][1]
![](https://img.haomeiwen.com/i1376538/74b81b875ed7c22b.png)
Seaborn
df = footballers[footballers['Position'].isin(['ST', 'GK'])]
g = sns.FacetGrid(df, col="Position", col_wrap=2)
g.map(sns.kdeplot, "Overall")
![](https://img.haomeiwen.com/i1376538/46d95d80fb9ef397.png)
df = footballers[footballers['Position'].isin(['ST', 'GK'])]
df = df[df['Club'].isin(['Real Madrid CF', 'FC Barcelona', 'Atlético Madrid'])]
g = sns.FacetGrid(df, row="Position", col="Club")
g.map(sns.violinplot, "Overall")
![](https://img.haomeiwen.com/i1376538/03d4bd57c51d9850.png)
df = footballers[footballers['Position'].isin(['ST', 'GK'])]
df = df[df['Club'].isin(['Real Madrid CF', 'FC Barcelona', 'Atlético Madrid'])]
g = sns.FacetGrid(df, row="Position", col="Club",
row_order=['GK', 'ST'],
col_order=['Atlético Madrid', 'FC Barcelona', 'Real Madrid CF'])
g.map(sns.violinplot, "Overall")
控制显示顺序
pairplot-多变量的相互关系
sns.pairplot(footballers[['Overall', 'Potential', 'Value']])
![](https://img.haomeiwen.com/i1376538/44cb3cd5be0473f1.png)
颜色,图标参数
sns.lmplot(
x='Value', y='Overall',
markers=['o', 'x', '*'],
hue='Position',
data=footballers.loc[footballers['Position'].isin(
['ST', 'RW', 'LW'])],
fit_reg=False
)
![](https://img.haomeiwen.com/i1376538/4625f7c140049bd4.png)
分组
f = (footballers
.loc[footballers['Position'].isin(['ST', 'GK'])]
.loc[:, ['Value', 'Overall', 'Aggression', 'Position']]
)
f = f[f["Overall"] >= 80]
f = f[f["Overall"] < 85]
f['Aggression'] = f['Aggression'].astype(float)
sns.boxplot(x="Overall", y="Aggression", hue='Position', data=f)
![](https://img.haomeiwen.com/i1376538/ff032ef223fc1918.png)
总结图
热力图
f = (
footballers.loc[:, ['Acceleration', 'Aggression', 'Agility', 'Balance', 'Ball control']]
.applymap(lambda v: int(v) if str.isdecimal(v) else np.nan)
.dropna()
).corr()
sns.heatmap(f, annot=True)
![](https://img.haomeiwen.com/i1376538/956e2ad836cbbc13.png)
平行线图
from pandas.plotting import parallel_coordinates
f = (
footballers.iloc[:, 12:17]
.loc[footballers['Position'].isin(['ST', 'GK'])]
.applymap(lambda v: int(v) if str.isdecimal(v) else np.nan)
.dropna()
)
f['Position'] = footballers['Position']
f = f.sample(200)
parallel_coordinates(f, 'Position')
![](https://img.haomeiwen.com/i1376538/686c0c9c384cfda7.png)
Seanborn使用
基本图形
柱状图-值统计
countplot == value_count
sns.countplot(reviews['points'])
![](https://img.haomeiwen.com/i1376538/c38d4fa9a61701f7.png)
折线图-密度图
sns.kdeplot(reviews.query('price < 200').price)
![](https://img.haomeiwen.com/i1376538/36b7a5590073f8c2.png)
二维密度图--类似蜂窝图作用
样本多,重复点多的时候用
sns.kdeplot(reviews[reviews['price'] < 200].loc[:, ['price', 'points']].dropna().sample(5000))
![](https://img.haomeiwen.com/i1376538/8ef5f88c575a2e04.png)
直方图
类似pandas.hist
sns.distplot(reviews['points'], bins=10, kde=False)
![](https://img.haomeiwen.com/i1376538/7171a21410cfce86.png)
散点图和直方图复合
sns.jointplot(x='price', y='points', data=reviews[reviews['price'] < 100])
![](https://img.haomeiwen.com/i1376538/3565b8eff318b842.png)
蜂窝图和直方图复合
sns.jointplot(x='price', y='points', data=reviews[reviews['price'] < 100], kind='hex',gridsize=20)
![](https://img.haomeiwen.com/i1376538/d71f984a19ed3581.png)
箱线图
df = reviews[reviews.variety.isin(reviews.variety.value_counts().head(5).index)]
sns.boxplot(
x='variety',
y='points',
data=df
)
![](https://img.haomeiwen.com/i1376538/bd2d1db83fe164fd.png)
小提琴图
sns.violinplot(
x='variety',
y='points',
data=reviews[reviews.variety.isin(reviews.variety.value_counts()[:5].index)]
)
![](https://img.haomeiwen.com/i1376538/e794ccae99690d3a.png)
网络动态图表-plotly
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
散点图
import plotly.graph_objs as go
iplot([go.Scatter(x=reviews.head(1000)['points'], y=reviews.head(1000)['price'], mode='markers')])
![](https://img.haomeiwen.com/i1376538/0e18d42626ad21e3.png)
热力图
iplot([go.Histogram2dContour(x=reviews.head(500)['points'],
y=reviews.head(500)['price'],
contours=go.Contours(coloring='heatmap')),
go.Scatter(x=reviews.head(1000)['points'], y=reviews.head(1000)['price'], mode='markers')])
![](https://img.haomeiwen.com/i1376538/ed8deb39065cf962.png)
图形语法的可视化库plotnine
from plotnine import *
top_wines = reviews[reviews['variety'].isin(reviews['variety'].value_counts().head(5).index)]
df = top_wines.head(1000).dropna()
(ggplot(df)
+ aes('points', 'price')
+ geom_point())
#其他表达形式ggplot(df)
+ geom_point(aes('points', 'price'))
)
(ggplot(df, aes('points', 'price'))
+ geom_point
一层层添加图形参数
![](https://img.haomeiwen.com/i1376538/b6b32c925dd155b2.png)
df = top_wines.head(1000).dropna()
(
ggplot(df)
+ aes('points', 'price')
+ geom_point()
+ stat_smooth()
)
![](https://img.haomeiwen.com/i1376538/f67647f60c83b922.png)
添加颜色
df = top_wines.head(1000).dropna()
(
ggplot(df)
+ geom_point()
+ aes(color='points')
+ aes('points', 'price')
+ stat_smooth()
)
一图多表
df = top_wines.head(1000).dropna()
(ggplot(df)
+ aes('points', 'price')
+ aes(color='points')
+ geom_point()
+ stat_smooth()
+ facet_wrap('~variety')
)
![](https://img.haomeiwen.com/i1376538/a07ee77c9862bb85.png)
柱状图
(ggplot(top_wines)
+ aes('points')
+ geom_bar()
)
![](https://img.haomeiwen.com/i1376538/f4e60207f09b8633.png)
二维热力图
(ggplot(top_wines)
+ aes('points', 'variety')
+ geom_bin2d(bins=20)
)
![](https://img.haomeiwen.com/i1376538/243f02e99c31bf94.png)
更多API文档 API Reference.
处理时间序列
一般柱状图
shelter_outcomes['date_of_birth'].value_counts().sort_values().plot.line()
![](https://img.haomeiwen.com/i1376538/93d0667d6f292b4e.png)
按年份重新取样
shelter_outcomes['date_of_birth'].value_counts().resample('Y').sum().plot.line()
![](https://img.haomeiwen.com/i1376538/8fa39fe816b047c2.png)
stocks['volume'].resample('Y').mean().plot.bar()
![](https://img.haomeiwen.com/i1376538/50cb3a7ba33287ab.png)
同期对比
如今年12月和去年12月比较
from pandas.plotting import lag_plot
lag_plot(stocks['volume'].tail(250))
![](https://img.haomeiwen.com/i1376538/efa245ee4cc54728.png)
自相关图
from pandas.plotting import autocorrelation_plot
autocorrelation_plot(stocks['volume'])
![](https://img.haomeiwen.com/i1376538/8d6321a2abdedb86.png)