Loading graph…

Tree-Based Prediction

Definition

Tree-based methods are predictive models that split the data into regions using decision rules.

A rule may look like:

\[X_j \leq c\]

where $X_j$ is a feature and $c$ is a split value.

Main Idea

Instead of fitting one global equation, tree-based methods divide the feature space into smaller regions and make a prediction inside each region.

For regression, the prediction in a region is usually the average of the target values in that region:

\[\hat y_R = \frac{1}{|R|}\sum_{i \in R} y_i\]

Common Methods

Why They Are Useful

Tree-based methods are useful when relationships are nonlinear or involve interactions.

For example, in retail data, basket size may depend on combinations of variables such as:

  • customer type
  • country
  • number of previous purchases
  • day of week
  • current basket size

A tree can learn rules such as:

\[\text{if current basket size} > 20 \text{ and customer is frequent, predict a large basket}\]

Relation to Regression

Tree-based methods can be used for regression when the target variable is numerical.

They estimate the conditional mean:

\[E[Y \mid X=x]\]

by averaging outcomes in similar regions of the feature space.

Strengths

  • Captures nonlinear relationships.
  • Handles interactions naturally.
  • Works with mixed numerical and categorical features.
  • Easy to visualize for small trees.

Weaknesses

  • A single tree can be unstable.
  • Small changes in data can change the tree.
  • Large trees can overfit.
  • Predictions are piecewise constant.

Retail Example

A tree-based method could predict final basket size using:

  • current basket size
  • customer recency
  • customer frequency
  • country
  • month
  • average past basket value

Exercises

  1. Explain why a tree-based model is not a single global equation.
  2. Give one retail feature that might create a useful split.
  3. Explain why a single tree can overfit.

See

25

25
Ready to start
Tree-Based Prediction
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions