决策树01

2020-06-08  本文已影响0人  文子轩
决策数算法的核心是要解决最佳节点和最佳分支

一、sklearn中决策树

tree.DecisionTreeClassifier 分类树
tree.DecisionTreeRegressor 回归树
tree.export_graphviz 将生成的决策树导出为DOT格式,画图专用
tree.ExtraTreeClassifier 高随机版本的分类树
tree.ExtraTreeRegressor 高随机版本的回归树

建树过程

from sklearn import tree                            #导入需要的模块
clf = tree.DecisionTreeClassifier()                 #实例化
clf = clf.fit(x_train,y_train)                     #用训练集数据训练模型
result = clf.score(x_test,y_test)                  #导入测试集,从接口中调用需要的信息

二、重要参数

2.1 criterion

criterion 这个参数是用来觉得不纯度的计算方法的,sklearn提供了两种选择

导入需要的算法库和模块

from sklearn  import tree
from sklearn.datasets import laod_wine
from sklearn.model_selection import train_test_split

探索数据

wine = load_wine()
wine.data.shape
wine.target
##如果wine是一张表 ,应该是这样的
import pandas as pd
pd.contact([pd.DataFrame(wine.data),pd.DataFrame(wine.target)],axis=1)
wine.feature_names
wine.target_names

分测试集和训练集

Xtrain,Xtest,Ytrain,Ytest = train_test_split(wine.data,wine.target,test_size=0.3)
Xtrain.shape
Xtest.shape

画出一颗树吧

feature_name = ['酒精','苹果酸','灰','灰的碱性','镁','总酚','类黄酮','非黄烷类酚类','花青素','颜
色强度','色调','od280/od315稀释葡萄酒','脯氨酸']
import graphviz
dot_data = tree.export_graphviz(clf,out_file=None
                              ,feature_names=feature_name
                             ,class_names = =["琴酒","雪莉","贝尔摩德"]
                              ,filled=Ture
                              ,rounded=Ture)
graph = graphviz.Source(dot_data)
graph

探索决策树

clf.feature_importtances_
[*zip(feature_name,clf.feature_importances))]

评价树

clf = tree.DecisionTreeClassifier(criterion='entropy',random_state=30)
clf = clf.fit(Xtrain,Ytrain)
score = clf.score(Xtest,Ytest)

2.2 random_state & splitter

random_state用来设置分枝中的随机模式的参数
splitter也是用来控制决策树中的随机选项的,有"best" 和 “random”两种参数

clf = tree.DecisionTreeClassifier(criterion='entropy'
                                  ,random_state=30
                                  ,splitter='random')
clf = clf.fit(Xtrain,Ytrain)
score = clf.score(Xtest,Ytest)
score

import graphviz
dot_data = tree.export_graphviz(clf
                                 ,feature_names = feature_name
                                 ,class_names==["琴酒","雪莉","贝尔摩德"]
                                ,filled= Ture
                                ,rounded = Ture)

剪枝参数

为了让决策树有更好的泛化性,我们要对决策树进行剪枝。剪枝策略对决策树的影响巨大,正确的剪枝策略是优化
决策树算法的核心。sklearn为我们提供了不同的剪枝策略:

clf = tree.DecisionTreeClassifier(criterion="entropy"
                                 ,random_state=30
                                 ,splitter="random"
                                 ,max_depth=3
                                 ,min_samples_leaf=10
                                 ,min_samples_split=10
                                 )
clf = clf.fit(Xtrain, Ytrain)
dot_data = tree.export_graphviz(clf
                               ,feature_names= feature_name
                               ,class_names=["琴酒","雪莉","贝尔摩德"]
                               ,filled=True
                               ,rounded=True
                               )  
graph = graphviz.Source(dot_data)
graph
clf.score(Xtrain,Ytrain)
clf.score(Xtest,Ytest)

max_features & min_impurity_decrease

确认最优剪枝参数

import matplotlib.pyplot as plt
test = []
for i in range(10):
    clf = tree.DecisionTreeClassifier(max_depth=i+1
                                     ,criterion="entropy"
                                     ,random_state=30
                                     ,splitter="random"
                                     )
    clf = clf.fit(Xtrain, Ytrain)
    score = clf.score(Xtest, Ytest)
    test.append(score)
plt.plot(range(1,11),test,color="red",label="max_depth")
plt.legend()
plt.show()

class_weight & min_weight_fraction_leaf
目标权重参数

上一篇 下一篇

猜你喜欢

热点阅读