pandas实例-鸢尾花数据集小探-可视化

2020-05-14 本文已影响0人橘猫吃不胖

鸢尾花数据集

算是一个入门的练习数据集，这里使用seaborn直接加载数据集，暂时不涉及算法之类的，所以不上sklearn

import seaborn as sns

iris = sns.load_dataset("iris")

一共150条记录，5列

sepal_length 花萼_长度
sepal_width  花萼_宽度
petal_length 花瓣_长度
petal_width  花瓣_宽度
species      种类

1. 先看看有鸢尾花有哪些种类吧

iris['species'].value_counts()

每一种都是50条记录，一共有3种鸢尾花

2. 用散点图，看看数据的分布情况（相关性）

关于散点图，可以参考：pandas散点图-plot.scatter

iris.plot.scatter(x='sepal_width' , y='sepal_length' , c='green')

因为鸢尾花有不同的种类，所以，我们按不同的种类来看看

pos = {'setosa':1 , 'versicolor':2 , 'virginica':3 }
iris['species_pos'] = iris['species'].apply(lambda x:pos[x])

iris.plot.scatter(x='sepal_width' , y='sepal_length' , c='species_pos' , colormap='viridis')

这个图，看上去有点儿奇怪，换一种方式来试试

species_color = {'setosa':'red' , 'versicolor':'green' , 'virginica':'blue'}

for s in iris['species'].unique():
    ax1 = iris.query('species==@s').plot.scatter(x='sepal_width' , y='sepal_length' , c=species_color[s] , label=s)
    ax1.legend()

我其实是想要在一个图表中显示的，我再改一下

species_color = {'setosa':'red' , 'versicolor':'green' , 'virginica':'blue'}

for s in iris['species'].unique():
    data = iris.query('species==@s')
    plt.scatter(x=data['sepal_width'] , y=data['sepal_length'] , c=species_color[s] , label=s)
    
plt.legend(loc='upper left')
plt.show()

上面看的是花萼，顺便看看花瓣

species_color = {'setosa':'red' , 'versicolor':'green' , 'virginica':'blue'}

for s in iris['species'].unique():
    data = iris.query('species==@s')
    plt.scatter(x=data['petal_width'] , y=data['petal_length'] , c=species_color[s] , label=s)
    
plt.legend(loc='upper left')
plt.show()

刚刚去学习了一下seaborn，使用seaborn来绘制上面的图表，真的是so easy
参考：

seaborn实例 - relplot - 散点图

sns.relplot(x='sepal_width' , y='sepal_length' , hue='species' , data=iris)

一行代码，就解决了，哈哈哈

sns.relplot(x='petal_width' , y='petal_length' , hue='species' , data=iris)

pandas实例-鸢尾花数据集小探-可视化

1. 先看看有鸢尾花有哪些种类吧

2. 用散点图，看看数据的分布情况（相关性）

猜你喜欢

热点阅读