Classification- Decision Tree（1）

2019-04-07 本文已影响0人钱晓缺

1. Decision Tree:

It is similar to the tree structure in the flow chart. Every node is a testing of one attribute, every branch is a output of one attribute. The top of the tree is root node.

2. Entropy

That means we need more information if a question is more uncertainty.

Bit is used to measure the amount of information.

More uncertainty, more entropy.

3. ID3(决策树归纳算法)

Information gain: Gain(A)=info(D)-infor_A(D)

Info(D) is the original data set, and infor_A(D) is the data set when make A as a node to classify.

If you find the Gain(X) is the max value of these attributes, then the X is the next node which classifies those data. And repeat this process，until all results are same in the one group.

4. How to deal with the continuous value.

Change them into Discrete values.

5. How to avoid over-fitting (The tree is too deep)

Tree pruning：

1）pruning first （pruning when classifying）

2）pruning late (pruning when finish classifying)

Classification- Decision Tree（1）

猜你喜欢

热点阅读