机器学习

机器学习-决策树

2018-04-25  本文已影响18人  老生住长亭
  1. 决策树
    概念:解决if/else问题,直到每个叶子节点都包含单一特征为止
    纯叶节点:页节点的和目标值相等
    控制复杂度
    预剪枝:核心控制节点数或者树枝深度
    后剪枝:删除重叠很少的数据
    工具
    DecisionTreeRegressor
    DecisionTreeRegressorClassifier
    画图工具:graphviz

  2. 决策树实例

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
import graphviz

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=42)

tree = DecisionTreeClassifier(random_state=0)
tree.fit(X_train, y_train)

print("Accuracy train set : {:.3f}".format(tree.score(X_train, y_train)))
print("Accuracy test set : {:.3f}".format(tree.score(X_test, y_test)))

tree4 = DecisionTreeClassifier(max_depth=4, random_state=0)
tree4.fit(X_train, y_train)

print("Accuracy max_depth= 4 train set : {:.3f}".format(tree4.score(X_train, y_train)))
print("Accuracy max_depth= 4 test set : {:.3f}".format(tree4.score(X_test, y_test)))

没有深度决策树,人气生长
export_graphviz(tree, out_file="tree.dot", class_names=["malignant", "benign"], feature_names=cancer.feature_names,
impurity=False, filled=True)

决策树深度为4
export_graphviz(tree4, out_file="tree4.dot", class_names=["malignant", "benign"], feature_names=cancer.feature_names,
impurity=False, filled=True)

with open("tree.dot") as f:
dot_graph = f.read()
graphviz.Source(dot_graph)

with open("tree4.dot") as f:
dot_graph = f.read()
graphviz.Source(dot_graph)

使用tree.dot生成图片tree.png,展示出决策树的结构
os.system("dot -Tpng tree.dot -o tree.png")
os.system("dot -Tpng tree4.dot -o tree4.png")

结果和不控制深度:

Accuracy train set : 1.000
Accuracy test set : 0.937
Accuracy max_depth= 4 train set : 0.988
Accuracy max_depth= 4 test set : 0.951

不控制深度,并且可以看出最后节点的都只有一个特征有值,另外一个特征为0(也可以叫单一节点):


tree.png

决策树深度为4图片:

tree4.png
上一篇下一篇

猜你喜欢

热点阅读