Pandas技巧

Python_Pandas_Select_Data_loc[ ]

2020-03-28  本文已影响0人  Kaspar433

.loc[]

.loc主要是基于标签的,但也可以与布尔数组一起使用。

可以输入如下几种类型:

import pandas as pd
import numpy as np
import seaborn as sns
​
iris = pd.read_csv('iris.csv',header=0).sample(10)
iris

out:
    sepal_length    sepal_width petal_length    petal_width species
11  4.8 3.4 1.6 0.2 setosa
106 4.9 2.5 4.5 1.7 virginica
14  5.8 4.0 1.2 0.2 setosa
61  5.9 3.0 4.2 1.5 versicolor
138 6.0 3.0 4.8 1.8 virginica
132 6.4 2.8 5.6 2.2 virginica
97  6.2 2.9 4.3 1.3 versicolor
119 6.0 2.2 5.0 1.5 virginica
31  5.4 3.4 1.5 0.4 setosa
19  5.1 3.8 1.5 0.3 setosa
iris.index = list('abcdefghij')
iris

out:
    sepal_length    sepal_width petal_length    petal_width species
a   5.6 2.5 3.9 1.1 versicolor
b   6.0 3.0 4.8 1.8 virginica
c   7.2 3.6 6.1 2.5 virginica
d   5.4 3.7 1.5 0.2 setosa
e   6.6 3.0 4.4 1.4 versicolor
f   6.4 2.8 5.6 2.1 virginica
g   4.8 3.4 1.9 0.2 setosa
h   5.7 2.9 4.2 1.3 versicolor
i   6.1 3.0 4.9 1.8 virginica
j   6.5 3.2 5.1 2.0 virginica

Series

species = iris.species.copy()
species.loc['b']

out:
'virginica'
species.loc['c':'e']

out:
c     virginica
d        setosa
e    versicolor
Name: species, dtype: object
species.loc['h':]
h    versicolor
i     virginica
j     virginica
Name: species, dtype: object

DataFrame

直接通过标签访问


iris.loc[['a','c','d'], :]

out:
sepal_length    sepal_width petal_length    petal_width species
a   5.6 2.5 3.9 1.1 versicolor
c   7.2 3.6 6.1 2.5 virginica
d   5.4 3.7 1.5 0.2 setosa

通过标签切片访问

iris.loc['b':'f', 'sepal_length':'petal_length']

out:
sepal_length    sepal_width petal_length
b   6.0 3.0 4.8
c   7.2 3.6 6.1
d   5.4 3.7 1.5
e   6.6 3.0 4.4
f   6.4 2.8 5.6

使用单个标签

iris.loc['d']

out:
sepal_length       5.4
sepal_width        3.7
petal_length       1.5
petal_width        0.2
species         setosa
Name: d, dtype: object

使用布尔数组

iris.loc[iris.sepal_length > iris.sepal_length.mean()]

out:
sepal_length    sepal_width petal_length    petal_width species
c   7.2 3.6 6.1 2.5 virginica
e   6.6 3.0 4.4 1.4 versicolor
f   6.4 2.8 5.6 2.1 virginica
i   6.1 3.0 4.9 1.8 virginica
j   6.5 3.2 5.1 2.0 virginica
iris.index = np.random.randint(0,10,10)
iris

out:
sepal_length    sepal_width petal_length    petal_width species
8   5.6 2.5 3.9 1.1 versicolor
5   6.0 3.0 4.8 1.8 virginica
9   7.2 3.6 6.1 2.5 virginica
4   5.4 3.7 1.5 0.2 setosa
2   6.6 3.0 4.4 1.4 versicolor
0   6.4 2.8 5.6 2.1 virginica
3   4.8 3.4 1.9 0.2 setosa
7   5.7 2.9 4.2 1.3 versicolor
3   6.1 3.0 4.9 1.8 virginica
5   6.5 3.2 5.1 2.0 virginica

使用.loc切片时,如果索引中存在开始和停止标签,则返回位于两者之间的元素(包括它们):

iris.loc[9:2]
sepal_length    sepal_width petal_length    petal_width species
9   7.2 3.6 6.1 2.5 virginica
4   5.4 3.7 1.5 0.2 setosa
2   6.6 3.0 4.4 1.4 versicolor

如果两个中至少有一个不存在,但索引已排序,并且可以与开始和停止标签进行比较,那么通过选择在两者之间排名的标签,切片仍将按预期工作:

iris.sort_index()
sepal_length    sepal_width petal_length    petal_width species
0   6.4 2.8 5.6 2.1 virginica
2   6.6 3.0 4.4 1.4 versicolor
3   4.8 3.4 1.9 0.2 setosa
3   6.1 3.0 4.9 1.8 virginica
4   5.4 3.7 1.5 0.2 setosa
5   6.0 3.0 4.8 1.8 virginica
5   6.5 3.2 5.1 2.0 virginica
7   5.7 2.9 4.2 1.3 versicolor
8   5.6 2.5 3.9 1.1 versicolor
9   7.2 3.6 6.1 2.5 virginica
iris.sort_index().loc[3:7]

out:
sepal_length    sepal_width petal_length    petal_width species
3   4.8 3.4 1.9 0.2 setosa
3   6.1 3.0 4.9 1.8 virginica
4   5.4 3.7 1.5 0.2 setosa
5   6.0 3.0 4.8 1.8 virginica
5   6.5 3.2 5.1 2.0 virginica
7   5.7 2.9 4.2 1.3 versicolor

使用可调用函数进行选择

df = pd.DataFrame(np.random.randn(6,4), index=list('abcdef'), columns=list('ABCD'))
df

out:
    A   B   C   D
a   0.737161    -0.514738   -1.457052   0.353337
b   0.801916    0.266375    -0.968714   -0.087611
c   -0.799433   -1.250238   -0.598625   1.259859
d   -0.780325   1.910598    -0.522512   -0.680966
e   -1.167703   -0.234484   0.243291    -1.931064
f   -0.147435   0.145292    -0.256636   -0.110757
df.loc[lambda df: df.index > 'c']
out:
    A   B   C   D
d   -0.780325   1.910598    -0.522512   -0.680966
e   -1.167703   -0.234484   0.243291    -1.931064
f   -0.147435   0.145292    -0.256636   -0.110757
df.loc[lambda df: df.A<0]

out:
    A   B   C   D
c   -0.799433   -1.250238   -0.598625   1.259859
d   -0.780325   1.910598    -0.522512   -0.680966
e   -1.167703   -0.234484   0.243291    -1.931064
f   -0.147435   0.145292    -0.256636   -0.110757
df.loc[lambda df: df.A<0, lambda df: ['A', 'B']]

out:
    A   B
c   -0.799433   -1.250238
d   -0.780325   1.910598
e   -1.167703   -0.234484
f   -0.147435   0.145292
​
上一篇下一篇

猜你喜欢

热点阅读