Loading graph…

Random Forests

Definition

A random forest is an ensemble of decision trees.

For regression, it averages the predictions of many regression trees:

\[\hat f(x) = \frac{1}{B}\sum_{b=1}^{B}\hat f_b(x)\]

where $B$ is the number of trees.

Main Idea

Random forests improve single decision trees by using two kinds of randomness:

  • bootstrap samples of the data
  • random subsets of features at each split

This creates many different trees.

The final prediction is the average of their predictions.

Why Random Features Matter

If all trees see the same strongest features at every split, they may become too similar.

Randomly limiting the available features makes the trees less correlated.

Less correlated trees produce a better average.

Regression Forest Prediction

Each tree gives a prediction:

\[\hat f_b(x)\]

The forest prediction is:

\[\hat f(x) = \frac{1}{B}\sum_{b=1}^{B}\hat f_b(x)\]

Retail Example

A random forest could predict final basket size using:

  • current basket size
  • number of known items
  • customer frequency
  • customer recency
  • country
  • month
  • previous average spend

This is useful when basket size depends on nonlinear interactions between customer behavior and basket contents.

Strengths

  • Usually more accurate than one tree.
  • Handles nonlinear relationships.
  • Handles interactions automatically.
  • Less unstable than a single tree.
  • Can estimate feature importance.

Weaknesses

  • Less interpretable than a single tree.
  • Can still underpredict rare extreme values.
  • Requires tuning.
  • Predictions are averages, so extreme predictions are often damped.

Relation to Basket-Size Prediction

If very large baskets are rare, a random forest may still underestimate them.

This happens because averaging many trees pulls predictions toward more common outcomes.

This is one reason predicted and actual values may diverge for extreme baskets.

Exercises

  1. Explain why random forests average many trees.
  2. Why does using random subsets of features help?
  3. In retail data, why might a random forest underpredict very large baskets?

See

Decision Trees

Regression Trees

Ensemble Learning

Bagging

25

25
Ready to start
Random Forests
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions