数据挖掘

python画Lift图,Lift曲线

2020-04-17  本文已影响0人  xiaogp

Lift的计算公式

Lift曲线的衡量的是模型通过某个阈值划定预测结果的命中率,对比不用模型随机划定结果的命中率的提升度.


image.png

Lift = \frac{True Positive / (True Positive + False Positive) }{(True Positive + False Negative) / (True Positive + False Positive + True Neagtive + False Negative)}

数据输入

输入:predictions, labels,threshold_list,cut_point

predictions: 为每条样本的预测值组成的集合,预测概率在0-1之间
labels: 为每条样本的真实值(0, 1)组成的集合,本例中1是坏客户
threshold_list: 阈值列表
cut_point: KS的阈值分割点的数量

数据预览,左列labels,右列predictions

head -4 test_predict_res.txt
0.0 0.831193
0.0 0.088209815
1.0 0.93411493
0.0 0.022157196

python代码实现

def lift_plot(predictions, labels, threshold_list, cut_point=100):
    base = len([x for x in labels if x == 1]) / len(labels)
    predictions_labels = list(zip(predictions, labels))
    lift_values = []

    x_axis_range = np.linspace(0, 1, cut_point)
    x_axis_valid = []
    for i in x_axis_range:
        hit_data = [x[1] for x in predictions_labels if x[0] > i]
        if hit_data:  # 避免为空
            bad_hit = [x for x in hit_data if x == 1]
            precision = len(bad_hit) / len(hit_data)
            lift_value = precision / base
            lift_values.append(lift_value)
            x_axis_valid.append(i)

    plt.plot(x_axis_valid, lift_values, color="blue")  # 提升线
    plt.plot([0, 1], [1, 1], linestyle="-", color="darkorange", alpha=0.5, linewidth=2)  # base线
    
    for threshold in threshold_list:
        threshold_hit_data = [x[1] for x in predictions_labels if x[0] > threshold]
        if threshold_hit_data:
            threshold_bad_hit = [x for x in threshold_hit_data if x == 1]
            threshold_precision = len(threshold_bad_hit) / len(threshold_hit_data)
            threshold_lift_value = threshold_precision / base
            plt.scatter([threshold], [threshold_lift_value], color="white", edgecolors="blue", s=20, label="threshold:{} lift:{}",format(threshold, round(threshold_lift_)value, 2)))  # 阈值点
            plt.plot([threshold, threshold], [0, 20], linestyle="--", color="black", alpha=0.2, linewidth=1)  # 阈值的纵轴
            plt.text(threshold - 0.02, threshold_lift_value + 1, round(threshold_lift_value, 2))
    plt.title("Lift plot")
    plt.legend(loc=2, prop={"size": 9})
    plt.grid()
    plt.show()


if __name__ == "__main__":
    # 读取预测数据和真实标签
    labels = []
    predictions = []
    with open("test_predict_res.txt", "r", encoding="utf8") as f:
        for line in f.readlines()
            labels.append(float(line.strip().split()[0]))
            predictions.append(float(line.strip().split()[1]))

    lift_plot(predictions, labels, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
lift图.png

Lift图的解释

举例预测企业风险,预测概率越接近1是高风险企业,在真实的坏企业发生比例的情况下(极低),随着模型预测概率越高,模型命中真实坏企业的能力越强,比如以0.8作为阈值,模型的预测能力比随机瞎猜提高4.23倍。

上一篇下一篇

猜你喜欢

热点阅读