woe与iv (python)
https://blog.csdn.net/kevin7658/article/details/50780391
IV与WOE:
IV表示一个变量的预测能力:
<=0.02,没有预测能力,不可用
0.02~0.1 弱预测性
0.1~0.2 有一定预测能力
0.2+高预测性
IV还可以用来挑选变量,IV就越大,它就越应该进入到入模变量列表中。
Psi
def calculate_psi(expected, actual, buckets=10): # test, base
def psi(expected_array, actual_array, buckets):
def scale_range(input, min, max):
input += -(np.min(input))
input /= np.max(input) / (max - min)
input += min
return input
# 按照概率值分10段
breakpoints = np.arange(0, buckets + 1) / (buckets) * 100
breakpoints = scale_range(breakpoints, np.min(expected_array), np.max(expected_array))
expected_percents = np.histogram(expected_array, breakpoints)[0] / len(expected_array)
# print(expected_percents)
actual_percents = np.histogram(actual_array, breakpoints)[0] / len(actual_array)
def sub_psi(test, base): # test,base
if base == 0:
base = 0.0001
if test == 0:
test = 0.0001
value = (test - base) * np.log(test / base)
return(value)
psi_value = np.sum(sub_psi(expected_percents[i], actual_percents[i]) for i in range(0, len(expected_percents)))
return(psi_value)
if len(expected.shape) == 1:
psi_values = np.empty(len(expected.shape))
else:
psi_values = np.empty(expected.shape[0])
for i in range(0, len(psi_values)):
if len(psi_values) == 1:
psi_values = psi(expected, actual, buckets)
else:
psi_values[i] = psi(expected[:,i], actual[:,i], buckets)
return(psi_values)