Random Forests

Definition

A random forest is an ensemble of decision trees.

For regression, it averages the predictions of many regression trees:

\[\hat f(x) = \frac{1}{B}\sum_{b=1}^{B}\hat f_b(x)\]

where $B$ is the number of trees.

Main Idea

Random forests improve single decision trees by using two kinds of randomness:

bootstrap samples of the data
random subsets of features at each split

This creates many different trees.

The final prediction is the average of their predictions.

Why Random Features Matter

If all trees see the same strongest features at every split, they may become too similar.

Randomly limiting the available features makes the trees less correlated.

Less correlated trees produce a better average.

Regression Forest Prediction

Each tree gives a prediction:

\[\hat f_b(x)\]

The forest prediction is:

\[\hat f(x) = \frac{1}{B}\sum_{b=1}^{B}\hat f_b(x)\]

Retail Example

A random forest could predict final basket size using:

current basket size
number of known items
customer frequency
customer recency
country
month
previous average spend

This is useful when basket size depends on nonlinear interactions between customer behavior and basket contents.

Strengths

Usually more accurate than one tree.
Handles nonlinear relationships.
Handles interactions automatically.
Less unstable than a single tree.
Can estimate feature importance.

Weaknesses

Less interpretable than a single tree.
Can still underpredict rare extreme values.
Requires tuning.
Predictions are averages, so extreme predictions are often damped.

Relation to Basket-Size Prediction

If very large baskets are rare, a random forest may still underestimate them.

This happens because averaging many trees pulls predictions toward more common outcomes.

This is one reason predicted and actual values may diverge for extreme baskets.

Exercises

Explain why random forests average many trees.
Why does using random subsets of features help?
In retail data, why might a random forest underpredict very large baskets?

Random Forests

Definition

Main Idea

Why Random Features Matter

Regression Forest Prediction

Retail Example

Strengths

Weaknesses

Relation to Basket-Size Prediction

Exercises

See

Decision Trees

Regression Trees

Ensemble Learning

Bagging

Random Forests

Definition

Main Idea

Why Random Features Matter

Regression Forest Prediction

Retail Example

Strengths

Weaknesses

Relation to Basket-Size Prediction

Exercises

See

Decision Trees

Regression Trees

Ensemble Learning

Bagging

Sessions by Day

Productivity by Hour

Session Completion Rate

Time Spent by Task

Sessions by Day of Week

Session Duration Distribution