小窥sklearn.metrics中的F1-score指标
Note: 本文以二分类为例。
F1-score是用来综合评估分类器召回(recall)和精确率(precision)的一个指标,其公式为:其中,
recall = TPR = TP/(TP+FN);
precision = PPV = TP/(TP+FP)
在sklearn.metrics.f1_score
中存在一个较为复杂的参数是average
,其有多个选项——None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’。下面简单对这些参数进行解释:
-
None, 当选择此参数时,则会输出每一个类别的f1-score;
-
binary,此参数仅适用于二分类,表示仅计算正样本(也即常见的二分类中的label 1)的f1-score;
-
micro,由上面的公式可知各个类别的F1-score是由各个类别的TP、FP、FN几个数值计算出来的,micro则是表示不区分类别,无论是label0还是label1,只要是将各个类别的TP、FP、FN加起来计算的到一个F1-score值;
-
macro,与micro相反,macro先计算各个类别的F1-score,然后直接计算各个类别F1-score的算数平均即为最终所示的F1-score;
-
weighted,与macro类似,也是先计算各个类别的F1-score然后计算最终F1-score,只是这里不再是算数平均,而是基于各个类别样本数的来赋予各个类别F1-score权重计算得到最终F1-score;
-
samples,Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from
accuracy_score
(用的较少)。
一个小栗子
下面是一个而分类模型的不同参数的F1-score值:
print('None:', f1_score(y_true, y_hat, average=None))
print('binary:', f1_score(y_true, y_hat, average='binary'))
print('micro:', f1_score(y_true, y_hat, average='micro'))
print('macro:', f1_score(y_true, y_hat, average='macro'))
print('weighted:', f1_score(y_true, y_hat, average='weighted'))
# 输出
None: [ 0.674 0.744]
binary: 0.744013475371028
micro: 0.713331633953607
macro: 0.7091534012629005
weighted: 0.7110354720495692
macro = (0.674 + 0.744)/2 # = 0.709
# 两个类别的样本数:label-0 = 37112, label-1 = 41348, 验证 weighted f1-score
0.674*37112/(37112+41348) + 0.744*41348/(37112+41348) # 0.710889 由于小数保留位数问题,稍有区别
寻找二分类最佳 binary(positive label) F1-score阈值
对于二分类问题,可以使用sklearn.metrics.precision_recall_curve来得到各个概率阈值下的precision和recall,但是需要注意的是这时候只计算 positive label 的recall 和 precision
, 若只关注positive label的召回和精确率,则可以使用该API返回的thresh及其对应的recall和precision来得到最佳划分阈值。
precs, recs, thrs = precision_recall_curve(y_true, y_prob)
f1s = 2 * precs * recs / (precs + recs)
best_thresh = thrs[np.argmax(f1s)]
需要注意的是此时仅关注positive label的f1-score,若正负样本的f1-score都关注,则此时求出来的阈值对于正负两个类别的综合f1-score并非最佳的。
附:利用阈值来找到最佳准确率
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_true, probs)
accuracy_scores = []
for thresh in thresholds:
accuracy_scores.append(accuracy_score(y_true,
[1 if m > thresh else 0 for m in probs]))
accuracies = np.array(accuracy_scores)
max_accuracy = accuracies.max()
max_accuracy_threshold = thresholds[accuracies.argmax()]