Pandas数据分析-数据筛选Indexing/Selectio
2022-05-16 本文已影响0人
Mc杰夫
(2022.05.16 Mon)
Pandas的Series的选取需要根据index
>> obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
>> obj
a 0.0
b 1.0
c 2.0
d 3.0
dtype: float64
Series可以通过index名字和序号两种方式索引。
>> obj[3]
3.0
>> obj['a':'c']
a 0.0
b 1.0
c 2.0
dtype: float64
>> obj[['b', 'a', 'd']] # 注意这里传递的是一个list
b 1.0
a 0.0
d 3.0
dtype: float64
>> obj[[3, 1, 2]]
d 3.0
b 1.0
c 2.0
dtype: float64
Pandas的DataFrame可以使用column和index number索引。
>> data = pd.DataFrame(np.arange(16).reshape((4, 4)),
index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])
>> data
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
指定DataFrame的某一/几列,使用该列的column name
>> data['four']
Ohio 3
Colorado 7
Utah 11
New York 15
Name: four, dtype: int64
>> data[['four', 'one']]
four one
Ohio 3 0
Colorado 7 4
Utah 11 8
New York 15 12
选定行,可以使用index number
>> data[:2]
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
>> data['three']>4
Ohio False
Colorado True
Utah True
New York True
Name: three, dtype: bool
>> data[data['three']>4]
one two three four
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
也可以使用loc
和iloc
的方式索引,其中iloc
表示用integer做索引筛选。注意最后一种条件索引方式。
>> data.loc['Colorado', ['two', 'three']]
two 5
three 6
Name: Colorado, dtype: int64
>> data.iloc[[1,2], [3, 0, 1]]
four one two
Colorado 7 4 5
Utah 11 8 9
>> data.iloc[2]
one 8
two 9
three 10
four 11
Name: Utah, dtype: int64
>> data[:'Utah', 'two']
Ohio 0
Colorado 5
Utah 9
Name: two, dtype: int64
>> data.iloc[:, :3][data.three > 5] # *********
one two three
Colorado 0 5 6
Utah 8 9 10
New York 12 13 14
Reference
1 Python for Data Analysis, Wes McKinney