Pandas数据分析-数据筛选Indexing/Selectio

2022-05-16  本文已影响0人  Mc杰夫

(2022.05.16 Mon)
Pandas的Series的选取需要根据index

>> obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
>> obj
a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

Series可以通过index名字和序号两种方式索引。

>> obj[3]
3.0
>> obj['a':'c']
a    0.0
b    1.0
c    2.0
dtype: float64
>> obj[['b', 'a', 'd']] # 注意这里传递的是一个list
b    1.0
a    0.0
d    3.0
dtype: float64
>> obj[[3, 1, 2]]
d    3.0
b    1.0
c    2.0
dtype: float64

Pandas的DataFrame可以使用column和index number索引。

>> data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                       index=['Ohio', 'Colorado', 'Utah', 'New York'],
                       columns=['one', 'two', 'three', 'four'])
>> data
           one two three four
Ohio        0   1    2    3
Colorado    4   5    6    7
Utah        8   9    10   11
New York    12  13   14   15

指定DataFrame的某一/几列,使用该列的column name

>> data['four']
Ohio         3
Colorado     7
Utah        11
New York    15
Name: four, dtype: int64
>> data[['four', 'one']]
         four one
Ohio       3  0
Colorado   7  4
Utah      11  8
New York  15  12

选定行,可以使用index number

>> data[:2]
one two three   four
Ohio    0   1   2   3
Colorado    4   5   6   7
>> data['three']>4
Ohio        False
Colorado     True
Utah         True
New York     True
Name: three, dtype: bool
>> data[data['three']>4]
           one two three four
Colorado    4   5   6   7
Utah        8   9   10  11
New York    12  13  14  15

也可以使用lociloc的方式索引,其中iloc表示用integer做索引筛选。注意最后一种条件索引方式。

>> data.loc['Colorado', ['two', 'three']]
two      5
three    6
Name: Colorado, dtype: int64
>> data.iloc[[1,2], [3, 0, 1]]
    four    one two
Colorado    7   4   5
Utah    11  8   9
>> data.iloc[2]
one       8
two       9
three    10
four     11
Name: Utah, dtype: int64
>> data[:'Utah', 'two']
Ohio 0
Colorado 5
Utah 9
Name: two, dtype: int64
>> data.iloc[:, :3][data.three > 5] # *********
one two three
Colorado 0 5 6
Utah 8 9 10
New York 12 13 14

Reference

1 Python for Data Analysis, Wes McKinney

上一篇下一篇

猜你喜欢

热点阅读