模型融合

2018-09-16  本文已影响85人  SimonLiu000

Code: https://github.com/SimonLliu/DGB_AlphaTeam

在本次比赛中,一开始观察了下sklearn和 mlxtend.classifier内置的模型融合函数

https://blog.csdn.net/LAW_130625/article/details/78573736

mlxtend:

sclf = StackingCVClassifier(classifiers=[xgb, rfc, etc], meta_classifier=lr, use_probas=True, n_folds=3, verbose=3)

sklearn:

fromsklearn.ensembleimportVotingClassifier

clf1=LogisticRegression(random_state=1)

clf2=RandomForestClassifier(random_state=1)

clf3=GaussianNB()

eclf=VotingClassifier(estimators=[('lr',clf1),('rf',clf2),('gnb',clf3)],voting='soft')

params={'lr__C':[1.0,100.0],'rf__n_estimators':[20,200],}

grid=GridSearchCV(estimator=eclf,param_grid=params,cv=5

grid=grid.fit(iris.data,iris.target)

由上可观察出利用sklearn内置的模型融合工具,需要对融合后的模型进行再训练。这就迫使了我们开发出了概率加和和分类投票两种模型融合方法。

1、概率加和

1)读取模型+预测概率

clf2 = joblib.load("lr(c40).pkl")

y_test=clf2.predict_proba(x_test)

df_test['proba']=y_test.tolist()

df_result = df_test.loc[:,['id','proba']]

df_result.to_csv('result_proba_lg.csv',index=False)

2)读取概率+概率相加

lg_df = pd.read_csv('result_proba_lg.csv')

def series2arr(series):

    res = []

    for row in series:

        res.append(np.array(eval(row)))

    return np.array(res)

lg_prob_arr = series2arr(lg_df['proba'])

final_prob = svm_prob_arr+lg_prob_arr+kn_prob_arr+nb_prob_arr

3)重新预测结果

y_class=[np.argmax(row) for row in final_prob]

df_test['proba']=y_class

df_test['proba']=df_test['proba']+1

2、分类投票

1、把每个数组按照列combine

a_l = []

for i in range(len(res_l[0])):

a_l.append([res_l[j][i] for j in range(10)])

2、投票

def voting(class_l):

final_class = []

c_l = []

for row in class_l:

c = Counter(row)

c_v_set = set(c.values())

# 票数不等取最大

if(len(c_v_set) > 1):

res = max(c,key=c.get)

else: # 票数相等取最好结果的的值

res = row[max_idx]

final_class.append(res)

c_l.append(c)

return final_class,c_l

final_class,c = voting(a_l)

上一篇下一篇

猜你喜欢

热点阅读