基本功能
2019-02-03 本文已影响3人
庵下桃花仙
重建索引(改变索引顺序)
重要方法,创建一个符合新索引的新对象。
In [1]: import pandas as pd
In [2]: obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
In [3]: obj
Out[3]:
d 4.5
b 7.2
a -5.3
c 3.6
dtype: float64
In [4]: obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
In [5]: obj2
Out[5]:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN
dtype: float64
method
可选参数允许使用ffill
方法将值前向填充。
In [6]: obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
In [7]: obj3
Out[7]:
0 blue
2 purple
4 yellow
dtype: object
In [8]: obj3.reindex(range(6), method='ffill')
Out[8]:
0 blue
1 blue
2 purple
3 purple
4 yellow
5 yellow
dtype: object
在 DataFrame 中,reindex 可以改变行索引,列索引,也可以同时改变两者。只传入一个序列时,默认改变行索引。
In [9]: import numpy as np
In [16]: frame = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'b', 'c'], columns=['Ohio', 'Texas', 'Californi
...: a'])
In [17]: frame
Out[17]:
Ohio Texas California
a 0 1 2
b 3 4 5
c 6 7 8
In [18]: frame2 = frame.reindex(['a', 'b', 'c', 'd'])
In [20]: frame2
Out[20]:
Ohio Texas California
a 0.0 1.0 2.0
b 3.0 4.0 5.0
c 6.0 7.0 8.0
d NaN NaN NaN
使用 columns
关键字重建索引
In [21]: states = ['Texas', 'Utah', 'California']
In [22]: frame.reindex(columns=states)
Out[22]:
Texas Utah California
a 1 NaN 2
b 4 NaN 5
c 7 NaN 8
更多人使用 loc 进行更简洁的索引
In [23]: frame.loc[['a', 'b', 'c', 'd'], states]
c:\users\a\appdata\local\programs\python\python36\lib\site-packages\pandas\core\indexing.py:1494: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
return self._getitem_tuple(key)
Out[23]:
Texas Utah California
a 1.0 NaN 2.0
b 4.0 NaN 5.0
c 7.0 NaN 8.0
d NaN NaN NaN
轴向上删除条目
如果已经拥有索引数组,drop 方法会返回一个含有指示值或轴向上删除值的新对象。
In [24]: obj = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
In [25]: obj
Out[25]:
a 0
b 1
c 2
d 3
e 4
dtype: int32
IIn [26]: new_obj = obj.drop('c')
In [27]: new_obj
Out[27]:
a 0
b 1
d 3
e 4
dtype: int32
In [28]: obj
Out[28]:
a 0
b 1
c 2
d 3
e 4
dtype: int32
In [29]: obj.drop(['d', 'c'])
Out[29]:
a 0
b 1
e 4
dtype: int32
在 DataFrame 中,索引值可以从轴向上删除
删除行
In [30]: data = pd.DataFrame(np.arange(16).reshape((4, 4)),
...: index=['Ohio', 'Colorado', 'Utah', 'New York'],
...: columns=['one', 'two', 'three', 'four'])
In [31]: data
Out[31]:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
In [32]: data.drop(['Colorado', 'Ohio'])
Out[32]:
one two three four
Utah 8 9 10 11
New York 12 13 14 15
删除列
In [33]: data.drop('two', axis=1)
Out[33]:
one three four
Ohio 0 2 3
Colorado 4 6 7
Utah 8 10 11
New York 12 14 15
In [34]: data.drop(['two', 'four'], axis='columns')
Out[34]:
one three
Ohio 0 2
Colorado 4 6
Utah 8 10
New York 12 14
drop 会修改 Series 或 DataFrame 的尺寸或形状,直接操作原对象而不返回新对象。
In [35]: obj.drop('c')
Out[35]:
a 0
b 1
d 3
e 4
dtype: int32
In [36]: obj
Out[36]:
a 0
b 1
c 2
d 3
e 4
dtype: int32
In [37]: obj.drop('c', inplace=True)
In [38]: obj
Out[38]:
a 0
b 1
d 3
e 4
dtype: int32