python

Python Pandas 使用[ ]进行数据操作

2020-03-25  本文已影响0人  Kaspar433

Python Pandas 使用[ ]进行数据操作

本文将介绍Pandas中“[ ]”的一些相关操作,如进行数据选择及更改。

“[ ]” 应该是最基本的选择数据的方法,下面是可以向其中传入的类型:

读入数据

import pandas as pd
import numpy as np
import seaborn as sns
df
dates = pd.date_range('1/1/2020', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('ABCD'))
df

out:
    A   B   C   D
2020-01-01  0.336131    -0.086456   0.096903    -1.230599
2020-01-02  -0.106293   0.111821    1.165342    -1.378462
2020-01-03  -0.933779   0.898738    0.013194    -0.593243
2020-01-04  0.190229    -1.108908   0.597650    2.759475
2020-01-05  -0.647080   1.573537    1.357191    -0.536916
2020-01-06  -0.455373   1.342904    -0.316548   0.145119
2020-01-07  -1.350214   -0.044642   0.501508    1.969973
2020-01-08  -0.474602   -0.384916   1.829222    0.853519

传入列表

传入列表,并以列表顺序读取,返回 DataFrame对象。

df[['C','D']]

    C   D
2020-01-01  0.096903    -1.230599
2020-01-02  1.165342    -1.378462
2020-01-03  0.013194    -0.593243
2020-01-04  0.597650    2.759475
2020-01-05  1.357191    -0.536916
2020-01-06  -0.316548   0.145119
2020-01-07  0.501508    1.969973
2020-01-08  1.829222    0.853519

传入单列

如果单独传入某一列,则返回series对象;如果传入列表,则返回DataFrame对象,即使列表的长度为1.

df['C']

out:
2020-01-01    0.096903
2020-01-02    1.165342
2020-01-03    0.013194
2020-01-04    0.597650
2020-01-05    1.357191
2020-01-06   -0.316548
2020-01-07    0.501508
2020-01-08    1.829222
Freq: D, Name: C, dtype: float64
df[['C']]

out:
2020-01-01  0.096903
2020-01-02  1.165342
2020-01-03  0.013194
2020-01-04  0.597650
2020-01-05  1.357191
2020-01-06  -0.316548
2020-01-07  0.501508
2020-01-08  1.829222

可以用来交换列值。

df[['A','B']] = df[['B','A']]
df

out:
    A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  0.111821    -0.106293   1.165342    -1.378462
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

如下所示是另一种交换子集的方法。

df.loc[:, ['A', 'B']] = df[['B', 'A']]
df.loc[:, ['A', 'B']] = df[['B', 'A']]
df

out:
    A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  0.111821    -0.106293   1.165342    -1.378462
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

上面的操作不会交换列值,交换列值需要使用值来交换。

df.loc[:, ['A', 'B']] = df[['B', 'A']].values
df

out:
A   B   C   D
2020-01-01  0.336131    -0.086456   0.096903    -1.230599
2020-01-02  -0.106293   0.111821    1.165342    -1.378462
2020-01-03  -0.933779   0.898738    0.013194    -0.593243
2020-01-04  0.190229    -1.108908   0.597650    2.759475
2020-01-05  -0.647080   1.573537    1.357191    -0.536916
2020-01-06  -0.455373   1.342904    -0.316548   0.145119
2020-01-07  -1.350214   -0.044642   0.501508    1.969973
2020-01-08  -0.474602   -0.384916   1.829222    0.853519

使用to_numpy()也可以进行交换。

df.loc[:, ['A', 'B']] = df[['B', 'A']].to_numpy()
df

out:
A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  0.111821    -0.106293   1.165342    -1.378462
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

使用切片

获取前两行数据

df[:2]

out:
A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-02  1.000000    2.000000    5.000000    6.000000

设置步长

df[::2]

out:
A   B   C   D
2020-01-01  -0.086456   0.336131    0.096903    -1.230599
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
df[1::2]

out:
A   B   C   D
2020-01-02  4.000000    5.000000    6.000000    7.000000
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

将数据逆序排列

df[::-1]

out:
A   B   C   D
2020-01-08  -0.384916   -0.474602   1.829222    0.853519
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-02  1.000000    2.000000    5.000000    6.000000
2020-01-01  -0.086456   0.336131    0.096903    -1.230599

使用切片进行赋值

df[:2] = np.arange(8).reshape(2,4)
df

out:
A   B   C   D
2020-01-01  0.000000    1.000000    2.000000    3.000000
2020-01-02  4.000000    5.000000    6.000000    7.000000
2020-01-03  0.898738    -0.933779   0.013194    -0.593243
2020-01-04  -1.108908   0.190229    0.597650    2.759475
2020-01-05  1.573537    -0.647080   1.357191    -0.536916
2020-01-06  1.342904    -0.455373   -0.316548   0.145119
2020-01-07  -0.044642   -1.350214   0.501508    1.969973
2020-01-08  -0.384916   -0.474602   1.829222    0.853519

使用布尔索引

df = pd.DataFrame(np.random.randn(8,4),index=dates,columns=list('abcd'))
df

out:
a   b   c   d
2020-01-01  -1.749988   -0.249398   -1.165277   -0.806687
2020-01-02  0.026334    0.158118    0.341183    -1.042534
2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
2020-01-04  1.719313    -1.417885   0.267647    -0.960537
2020-01-05  -0.259797   -0.851702   -0.873451   -0.476420
2020-01-06  -0.048619   -0.690095   0.759120    1.184295
2020-01-07  -0.748535   -1.252718   0.386220    -0.415996
2020-01-08  -0.497471   -0.550428   -0.867333   -0.109223
mask = df['a'] > 0
mask

out:
2020-01-01    False
2020-01-02     True
2020-01-03     True
2020-01-04     True
2020-01-05    False
2020-01-06    False
2020-01-07    False
2020-01-08    False
Freq: D, Name: a, dtype: bool
df[mask]

out:
a   b   c   d
2020-01-02  0.026334    0.158118    0.341183    -1.042534
2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
2020-01-04  1.719313    -1.417885   0.267647    -0.960537

多条件

df[mask & mask2]
mask2 = df['b'] < 0
​
df[mask & mask2]

out:
a   b   c   d
2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
2020-01-04  1.719313    -1.417885   0.267647    -0.960537

使用布尔索引更改数据

df[mask & mask2] = np.arange(8).reshape(2,4)
df

out:
a   b   c   d
2020-01-01  -1.749988   -0.249398   -1.165277   -0.806687
2020-01-02  0.026334    0.158118    0.341183    -1.042534
2020-01-03  0.000000    1.000000    2.000000    3.000000
2020-01-04  4.000000    5.000000    6.000000    7.000000
2020-01-05  -0.259797   -0.851702   -0.873451   -0.476420
2020-01-06  -0.048619   -0.690095   0.759120    1.184295
2020-01-07  -0.748535   -1.252718   0.386220    -0.415996
2020-01-08  -0.497471   -0.550428   -0.867333   -0.109223
上一篇 下一篇

猜你喜欢

热点阅读