Loading graph…

Bagging

Definition

Bagging means bootstrap aggregating.

It is an ensemble method where many models are trained on different bootstrap samples of the training data and then averaged.

Bootstrap Samples

A bootstrap sample is created by sampling from the training data with replacement.

If the original dataset has $n$ observations, each bootstrap sample also usually has $n$ observations, but some rows appear multiple times and some are left out.

Regression Prediction

For regression, if we train $B$ models, bagging predicts:

\[\hat f_{bag}(x) = \frac{1}{B}\sum_{b=1}^{B}\hat f_b(x)\]

where $\hat f_b(x)$ is the prediction from model $b$.

Main Idea

Bagging is especially useful for high-variance models.

Decision trees have high variance, so bagging is a natural way to improve them.

Relation to Random Forests

Random forests are based on bagging, but add one more idea.

At each tree split, a random forest considers only a random subset of features.

This makes the trees less correlated and improves the ensemble.

Retail Example

For basket-size prediction, bagging could train many regression trees on different bootstrap samples of invoices.

The final prediction is the average of the tree predictions.

Strengths

  • Reduces variance.
  • Improves unstable models.
  • Simple idea.
  • Works especially well with trees.

Weaknesses

  • Less interpretable than a single tree.
  • Does not strongly reduce bias.
  • Requires training many models.

Exercises

  1. What does sampling with replacement mean?
  2. Why is bagging useful for decision trees?
  3. How is random forest different from ordinary bagging?

See

Ensemble Learning

Random Forests

Decision Trees

25

25
Ready to start
Bagging
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions