1/12Ensemble Methods
Why Would We Want to Ensemble Learners Together?
There are two competing variables in finding a well fitting machine learning model: Bias and Variance.
Bias: When a model has high bias, this means that means it doesn't do a good job of bending to the data.
Variance: When a model has high variance, this means that it changes drastically to meet the needs of every point in our dataset.
Introducing Randomness Into Ensembles
Another method that is used to improve ensemble methods is to introduce randomness into high variance algorithms before they are ensembled together. The introduction of randomness combats the tendency of these algorithms to overfit (or fit directly to the data available). There are two main ways that randomness is introduced:
Bootstrap the data - that is, sampling the data with replacement and fitting your algorithm and fitting your algorithm to the sampled data.
Subset the features - in each split of a decision tree or with each algorithm used an ensemble only a subset of the total possible features are used.
5、Adaboost in sklearn
>>> from sklearn.ensemble import AdaBoostClassifier
>>> model = AdaBoostClassifier()
>>> model.fit(x_train, y_train)
>>> model.predict(x_test)
base_estimator:The model utilized for the weak learners (Warning: Don't forget to import the model that you decide to use for the weak learner).
n_estimators:The maximum number of weak learners used.
>>> from sklearn.tree import DecisionTreeClassifier
>>> model = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(max_depth=2), n_estimators =4)
1、Bootstrap the data - that is, sampling the data with replacement and fitting your algorithm and fitting your algorithm to the sampled data.
2、Subset the features - in each split of a decision tree or with each algorithm used an ensemble only a subset of the total possible features are used.