怎么one-hot/dummy编码?

2020-06-29  本文已影响0人  小幸运Q

import pandas as pd  
df = pd.DataFrame([  
            ['green', 'M', 10.1, 'class1'],   
            ['red', 'L', 13.5, 'class2'],   
            ['blue', 'XL', 15.3, 'class1']])  
  
df.columns = ['color', 'size', 'prize', 'class label']

>>> print(df)
   color size  prize class label
0  green    M   10.1      class1
1    red    L   13.5      class2
2   blue   XL   15.3      class1

size_mapping = {  
           'XL': 3,  
           'L': 2,  
           'M': 1}  
df['size'] = df['size'].map(size_mapping)  
  
class_mapping = {label:idx for idx,label in enumerate(set(df['class label']))}  
df['class label'] = df['class label'].map(class_mapping)  

>>> print(df)
   color  size  prize  class label
0  green     1   10.1            1
1    red     2   13.5            0
2   blue     3   15.3            1

其实可以通过get_dummies生成同样的结果:

>>> pd.get_dummies(df)
   size  prize  class label  color_blue  color_green  color_red
0     1   10.1            1           0            1          0
1     2   13.5            0           0            0          1
2     3   15.3            1           1            0          0

dummy会用全零代表一类,onehot不会

上一篇下一篇

猜你喜欢

热点阅读