Boosting方法中的特征重要度
2018-05-29 本文已影响211人
只为此心无垠
来源三个文档: DecisionTree, XGBoost, LightGBM。
Decision Tree
feature_importances_ : array of shape = [n_features]
The feature importances. The higher, the more important the feature.
The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature.
It is also known as the Gini importance
该特征带来的信息增益的总量(需要经过标准化)。也被称为基尼重要性。
XGBoost
地址:XGBoost文档
def get_score(self, fmap='', importance_type='weight'):
"""Get feature importance of each feature.
Importance type can be defined as:
'weight' - the number of times a feature is used to split the data across all trees.
'gain' - the average gain of the feature when it is used in trees
'cover' - the average coverage of the feature when it is used in trees
Parameters
----------
fmap: str (optional)
The name of feature map file
"""
weight: 该特征被选为分裂特征的次数。
gain: 该特征的带来平均增益(有多棵树)。在tree中用到时的gain之和/在tree中用到的次数计数。
cover: 该特征对每棵树的覆盖率。
LightGBM
地址:lightgbm文档
def feature_importance(self, importance_type='split'):
"""
Get feature importances
Parameters
----------
importance_type : str, default "split"
How the importance is calculated: "split" or "gain"
"split" is the number of times a feature is used in a model
"gain" is the total gain of splits which use the feature
Returns
-------
result : array
Array of feature importances.
"""
split: 使用该特征的次数。
gain: 该特征的总增益。