利用Python进行数据分析

Series

2019-01-31  本文已影响6人  庵下桃花仙

一维数组型对象,包含值序列和索引。

In [1]: import pandas as pd
In [3]: obj = pd.Series([4, 7, -5, 3])
In [4]: obj
Out[4]:
0    4
1    7
2   -5
3    3
dtype: int64

两个属性

In [5]: obj.values
Out[5]: array([ 4,  7, -5,  3], dtype=int64)

In [6]: obj.index
Out[6]: RangeIndex(start=0, stop=4, step=1)

用标签标识每个数据点

In [7]: obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])

In [8]: obj2
Out[8]:
d    4
b    7
a   -5
c    3
dtype: int64
In [10]: obj2.index
Out[10]: Index(['d', 'b', 'a', 'c'], dtype='object')

使用标签进行索引

In [12]: obj2['a']
Out[12]: -5

In [13]: obj2['d'] = 6

In [14]: obj2[['c', 'a', 'd']]
Out[14]:
c    3
a   -5
d    6
dtype: int64

数学操作保存索引值连接

In [15]: obj2[obj2 > 0]
Out[15]:
d    6
b    7
c    3
dtype: int64

In [16]: obj * 2
Out[16]:
0     8
1    14
2   -10
3     6
dtype: int64
In [18]: import numpy as np

In [19]: np.exp(obj2)
Out[19]:
d     403.428793
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

Series 可以看作长度固定且有序的字典

In [21]: 'e' in obj2
Out[21]: False

In [22]: 'b' in obj2
Out[22]: True
In [23]: sdata = {'0hio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

In [24]: obj3 = pd.Series(sdata)

In [25]: obj3
Out[25]:
0hio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

使生成的Series的索引顺序符合预期

In [26]: states = ['California', 'Ohio', 'Oregon', 'Texas']

In [27]: obj4 = pd.Series(sdata, index=states)

In [28]: obj4
Out[28]:
California        NaN
Ohio              NaN
Oregon        16000.0
Texas         71000.0
dtype: float64

NaN(not a number),表示缺失值,pandas中用 isnull 和 notnull 函数检查缺失数据。

In [29]: pd.isnull(obj4)
Out[29]:
California     True
Ohio           True
Oregon        False
Texas         False
dtype: bool

In [30]: pd.notnull(obj4)
Out[30]:
California    False
Ohio          False
Oregon         True
Texas          True
dtype: bool
In [31]: obj4.isnull()
Out[31]:
California     True
Ohio           True
Oregon        False
Texas         False
dtype: bool

Series 可以自动对齐索引

In [32]: obj3
Out[32]:
0hio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [33]: obj4
Out[33]:
California        NaN
Ohio              NaN
Oregon        16000.0
Texas         71000.0
dtype: float64

In [34]: obj3 + obj4
Out[34]:
0hio               NaN
California         NaN
Ohio               NaN
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

对象自身和索引都有 name 属性

In [35]: obj4.name = 'population'

In [36]: obj4.index.name = 'satate'

In [37]: obj4
Out[37]:
satate
California        NaN
Ohio              NaN
Oregon        16000.0
Texas         71000.0
Name: population, dtype: float64

索引可以改变

In [38]: obj
Out[38]:
0    4
1    7
2   -5
3    3
dtype: int64
In [40]: obj.index = ['Bob','Steve', 'Jeff', 'Ryan']

In [41]: obj
Out[41]:
Bob      4
Steve    7
Jeff    -5
Ryan     3
dtype: int64
上一篇下一篇

猜你喜欢

热点阅读