Data structure

Pandas-24. Category

2019-04-04  本文已影响0人  悠扬前奏

Pandas-24. Category

创建Category对象

指定dtype

import pandas as pd
s = pd.Series(["a","b","c","a"], dtype="category")
print (s)
'''
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a, b, c]
'''

注意Category只有三个了。

pd.Categorical

Category的构造函数:

pandas.Categorical(values, categories, ordered)

只有值参数:

cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
print(cat)
'''
[a, b, c, a, b, c]
Categories (3, object): [a, b, c]
'''

值参数+类别参数:

cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'])
print (cat)
'''
[a, b, c, a, b, c, NaN]
Categories (3, object): [c, b, a]
'''

第二个参数是类别参数,在类别参数中不存在的值视为NaN

加上ordered参数:

 cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True)
print (cat)
'''
[a, b, c, a, b, c, NaN]
Categories (3, object): [c < b < a]
'''

逻辑排序上a>b> c

描述

Category对象的describe()函数,返回对Category的基础信息。

cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
print(cat.describe())
'''
count     3
unique    2
top       c
freq      2
Name: cat, dtype: object
'''

Category的属性

categories包含了Category对象的属性:

print(cat.categories)
'''
Index(['b', 'a', 'c'], dtype='object')
'''

顺序

ordered属性包含了Category对象的顺序:

print(cat.ordered)
'''
False # 未指定顺序
'''

重命名类别

通过将新值重新分配categories属性,可以重命名类别:

print (cat.categories)
cat.categories = ["Group %s" % g for g in s.cat.categories]
print(cat.categories)
'''
Index(['b', 'a', 'c'], dtype='object')
Index(['Group a', 'Group b', 'Group c'], dtype='object')
'''

附加新类别

add_categories()方法可以追加新类别:

cat = pd.Categorical(["a", "c", "c", 'a'], categories=["b", "a", "c"])
cat = cat.add_categories("d")
print(cat)
'''
[a, c, c, a]
Categories (4, object): [b, a, c, d]
'''

删除类别

remove_categories()方法可以删除类别:

print(cat.remove_categories("a"))
'''
[NaN, c, c, NaN]
Categories (3, object): [b, c, d]
'''

比较类别

三种情况下可以将Category和其他类型进行比较:

cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True)
cat1 = pd.Series([2,2,2]).astype("category", categories=[1,2,3], ordered=True)
print(cat > cat1)
print("----------")
print(cat == [1,2,3])
print("----------")
print(cat > 2)
'''
0    False
1    False
2     True
dtype: bool
----------
0    True
1    True
2    True
dtype: bool
----------
0    False
1    False
2     True
dtype: bool
'''
上一篇 下一篇

猜你喜欢

热点阅读