机器学习-决策树
-
决策树
概念:解决if/else问题,直到每个叶子节点都包含单一特征为止
纯叶节点:页节点的和目标值相等
控制复杂度
预剪枝:核心控制节点数或者树枝深度
后剪枝:删除重叠很少的数据
工具
DecisionTreeRegressor
DecisionTreeRegressorClassifier
画图工具:graphviz -
决策树实例
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
import graphviz
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state=42)
tree = DecisionTreeClassifier(random_state=0)
tree.fit(X_train, y_train)
print("Accuracy train set : {:.3f}".format(tree.score(X_train, y_train)))
print("Accuracy test set : {:.3f}".format(tree.score(X_test, y_test)))
tree4 = DecisionTreeClassifier(max_depth=4, random_state=0)
tree4.fit(X_train, y_train)
print("Accuracy max_depth= 4 train set : {:.3f}".format(tree4.score(X_train, y_train)))
print("Accuracy max_depth= 4 test set : {:.3f}".format(tree4.score(X_test, y_test)))
没有深度决策树,人气生长
export_graphviz(tree, out_file="tree.dot", class_names=["malignant", "benign"], feature_names=cancer.feature_names,
impurity=False, filled=True)
决策树深度为4
export_graphviz(tree4, out_file="tree4.dot", class_names=["malignant", "benign"], feature_names=cancer.feature_names,
impurity=False, filled=True)
with open("tree.dot") as f:
dot_graph = f.read()
graphviz.Source(dot_graph)
with open("tree4.dot") as f:
dot_graph = f.read()
graphviz.Source(dot_graph)
使用tree.dot生成图片tree.png,展示出决策树的结构
os.system("dot -Tpng tree.dot -o tree.png")
os.system("dot -Tpng tree4.dot -o tree4.png")
结果和不控制深度:
Accuracy train set : 1.000
Accuracy test set : 0.937
Accuracy max_depth= 4 train set : 0.988
Accuracy max_depth= 4 test set : 0.951
不控制深度,并且可以看出最后节点的都只有一个特征有值,另外一个特征为0(也可以叫单一节点):
tree.png
决策树深度为4图片:
tree4.png