Tree based Model

2016-09-14  本文已影响72人  abrocod

Random Forest

Note 1:

A Gentle Introduction to Random Forests, Ensembles, and Performance Metrics in a Commercial System

Here is how such a system is trained; for some number of trees T:

Running a Random Forest. When a new input is entered into the system, it is run down all of the trees. The result may either be an average or weighted average of all of the terminal nodes that are reached, or, in the case of categorical variables, a voting majority.

Note that:

Strengths and weaknesses. Random forest runtimes are quite fast, and they are able to deal with unbalanced and missing data. Random Forest weaknesses are that when used for regression they cannot predict beyond the range in the training data, and that they may over-fit data sets that are particularly noisy. Of course, the best test of any algorithm is how well it works upon your own data set.

Note 2

http://blog.echen.me/2011/03/14/laymans-introduction-to-random-forests/

Note 3

https://www.analyticsvidhya.com/blog/2016/04/complete-tutorial-tree-based-modeling-scratch-in-python/

Bias and variance tradeoff

You build a small tree and you will get a model with low variance and high bias. How do you manage to balance the trade off between bias and variance ?
Normally, as you increase the complexity of your model, you will see a reduction in prediction error due to lower bias in the model. As you continue to make your model more complex, you end up over-fitting your model and your model will start suffering from high variance.

A champion model should maintain a balance between these two types of errors. This is known as the trade-off management of bias-variance errors. Ensemble learning is one way to execute this trade off analysis.


Gradient Boosting

Note 1:

https://www.quora.com/What-is-an-intuitive-explanation-of-Gradient-Boosting

Note 2:

XGBoost official page: http://xgboost.readthedocs.io/en/latest/model.html

上一篇 下一篇

猜你喜欢

热点阅读