索引、选择与过滤
2019-02-12 本文已影响0人
庵下桃花仙
索引
In [39]: obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
In [40]: obj
Out[40]:
a 0.0
b 1.0
c 2.0
d 3.0
dtype: float64
In [41]: obj['b']
Out[41]: 1.0
In [42]: obj[1]
Out[42]: 1.0
In [43]: obj[2 : 4]
Out[43]:
c 2.0
d 3.0
dtype: float64
In [44]: obj[['b', 'a', 'd']]
Out[44]:
b 1.0
a 0.0
d 3.0
dtype: float64
In [45]: obj[[1, 3]]
Out[45]:
b 1.0
d 3.0
dtype: float64
In [46]: obj[obj < 2]
Out[46]:
a 0.0
b 1.0
dtype: float64
普通切片不包含尾部,Series 不同
In [47]: obj['b' : 'c']
Out[47]:
b 1.0
c 2.0
dtype: float64
设值修改相应部分
In [48]: obj['b' : 'c'] = 5
In [49]: obj
Out[49]:
a 0.0
b 5.0
c 5.0
d 3.0
dtype: float64
使用单个值或序列,可以从 DataFrame 中索引出一个或多个列。
In [50]: data = pd.DataFrame(np.arange(16).reshape((4, 4)),
...: index=['Ohio', 'Colorado', 'Utah', 'New York'],
...: columns=['one', 'two', 'three', 'four'])
In [51]: data
Out[51]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
In [52]: data['two']
Out[52]:
Ohio 1
Colorado 5
Utah 9
New York 13
Name: two, dtype: int32
In [53]: data[['three', 'one']]
Out[53]:
three one
Ohio 2 0
Colorado 6 4
Utah 10 8
New York 14 12
根据一个布尔数组切片或选择数据
In [54]: data[:2]
Out[54]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
In [55]: data[data['three'] > 5]
Out[55]:
one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
使用布尔值 DataFrame 进行索引
In [56]: data < 5
Out[56]:
one two three four
Ohio True True True True
Colorado True False False False
Utah False False False False
New York False False False False
In [57]: data[data < 5] = 0
In [58]: data
Out[58]:
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
很像 Numpy 二维数组索引
使用 loc 和 iloc 选择数据进行行标签索引。
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: data = pd.DataFrame(np.arange(16).reshape((4, 4)),
...: index = ['Ohio', 'Colorado', 'Utah', 'New York'],
...: columns = ['one', 'two', 'three', 'four'])
In [4]: data < 5
Out[4]:
one two three four
Ohio True True True True
Colorado True False False False
Utah False False False False
New York False False False False
In [5]:
In [5]: data[data < 5] = 0
In [6]: data
Out[6]:
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
In [7]: data.loc['Colorado', ['two', 'three']]
Out[7]:
two 5
three 6
Name: Colorado, dtype: int32
In [8]: data.iloc[2, [3, 0, 1]]
Out[8]:
four 11
one 8
two 9
Name: Utah, dtype: int32
In [9]: data.iloc[2]
Out[9]:
one 8
two 9
three 10
four 11
Name: Utah, dtype: int32
In [10]: data.iloc[[1, 2], [3, 0, 1]]
Out[10]:
four one two
Colorado 7 0 5
Utah 11 8 9
索引功能还用于切片
In [12]: data.loc[:'Utah', 'two']
Out[12]:
Ohio 0
Colorado 5
Utah 9
Name: two, dtype: int32
In [13]: data.iloc[:, :3][data.three > 5]
Out[13]:
one two three
Colorado 0 5 6
Utah 8 9 10
New York 12 13 14
DataFrame 索引选项 《利用Python进行数据分析》143页
整数索引
In [15]: ser = pd.Series(np.arange(3.))
In [16]: ser
Out[16]:
0 0.0
1 1.0
2 2.0
dtype: float64
In [17]: ser[:1]
Out[17]:
0 0.0
dtype: float64
In [18]: ser.loc[:1]
Out[18]:
0 0.0
1 1.0
dtype: float64
In [19]: ser.iloc[:1]
Out[19]:
0 0.0
dtype: float64