sklearn计算ROC曲线下面积AUC
2017-06-16 本文已影响1525人
DayDayUp_hhxx
sklearn.metrics.auc
sklearn.metrics.auc(x, y, reorder=False)
通用方法,使用梯形规则计算曲线下面积。
import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)
metrics.auc(fpr, tpr)
sklearn.metrics.roc_auc_score
sklearn.metrics.roc_auc_score(y_true, y_score, average='macro', sample_weight=None)
计算预测得分曲线下的面积。
只用在二分类任务或者 label indicator 格式的多分类。
- y_true:array, shape = [n_samples] or [n_samples, n_classes]
真实的标签 - y_score:array, shape = [n_samples] or [n_samples, n_classes]
预测得分,可以是正类的估计概率、置信值或者分类器方法 “decision_function” 的返回值; - average:string, [None, ‘micro’, ‘macro’ (default), ‘samples’, ‘weighted’]
- sample_weight : array-like of shape = [n_samples], optional
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
roc_auc_score(y_true, y_scores)
一个完整的例子
from sklearn import datasets,svm,metrics,model_selection,preprocessing
iris=datasets.load_iris()
x=iris.data[iris.target!=0,:2]
x=preprocessing.StandardScaler().fit_transform(x)
y=iris.target[iris.target!=0]
x_train,x_test,y_train,y_test=model_selection.train_test_split(x,y,
test_size=0.1,random_state=25)
clf=svm.SVC(kernel='linear')
clf.fit(x_train,y_train)
metrics.f1_score(y_test,clf.predict(x_test))
0.75
fpr,tpr,thresholds=metrics.roc_curve(y_test,clf.decision_function(x_test),
pos_label=2)
metrics.auc(fpr,tpr)
0.95833333333333337
总结
roc_auc_score 是 预测得分曲线下的 auc,在计算的时候调用了 auc;
def _binary_roc_auc_score(y_true, y_score, sample_weight=None):
if len(np.unique(y_true)) != 2:
raise ValueError("Only one class present in y_true. ROC AUC score "
"is not defined in that case.")
fpr, tpr, tresholds = roc_curve(y_true, y_score,
sample_weight=sample_weight)
return auc(fpr, tpr, reorder=True)
两种方法都可以得到同样的结果。
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
print(roc_auc_score(y_true, y_scores))
0.75
fpr,tpr,thresholds=metrics.roc_curve(y_true,y_scores,pos_label=1)
print(metrics.auc(fpr,tpr))
0.75
需要注意的是,roc_auc_score 中不能设置 pos_label,而在 roc_curve中,pos_label的默认设置如下:
classes = np.unique(y_true)
if (pos_label is None and
not (array_equal(classes, [0, 1]) or
array_equal(classes, [-1, 1]) or
array_equal(classes, [0]) or
array_equal(classes, [-1]) or
array_equal(classes, [1]))):
raise ValueError("Data is not binary and pos_label is not specified")
elif pos_label is None:
pos_label = 1.
也就是说,roc_auc_score 中 pos_label 必须满足以上条件,才能直接使用,否则,需要使用 roc_curve 和auc。
import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=1)
print(metrics.auc(fpr, tpr))
0.75
print(metrics.roc_auc_score(y,pred))
ValueError: Data is not binary and pos_label is not specified
#pos_label 不符合 roc_curve的默认设置,因此报错,可以修改为
y=np.array([0,0,1,1]) #np.array([-1,-1,1,1])
print(metrics.roc_auc_score(y,pred))
0.75