Tree-Based Prediction
Definition
Tree-based methods are predictive models that split the data into regions using decision rules.
A rule may look like:
\[X_j \leq c\]where $X_j$ is a feature and $c$ is a split value.
Main Idea
Instead of fitting one global equation, tree-based methods divide the feature space into smaller regions and make a prediction inside each region.
For regression, the prediction in a region is usually the average of the target values in that region:
\[\hat y_R = \frac{1}{|R|}\sum_{i \in R} y_i\]Common Methods
Why They Are Useful
Tree-based methods are useful when relationships are nonlinear or involve interactions.
For example, in retail data, basket size may depend on combinations of variables such as:
- customer type
- country
- number of previous purchases
- day of week
- current basket size
A tree can learn rules such as:
\[\text{if current basket size} > 20 \text{ and customer is frequent, predict a large basket}\]Relation to Regression
Tree-based methods can be used for regression when the target variable is numerical.
They estimate the conditional mean:
\[E[Y \mid X=x]\]by averaging outcomes in similar regions of the feature space.
Strengths
- Captures nonlinear relationships.
- Handles interactions naturally.
- Works with mixed numerical and categorical features.
- Easy to visualize for small trees.
Weaknesses
- A single tree can be unstable.
- Small changes in data can change the tree.
- Large trees can overfit.
- Predictions are piecewise constant.
Retail Example
A tree-based method could predict final basket size using:
- current basket size
- customer recency
- customer frequency
- country
- month
- average past basket value
Exercises
- Explain why a tree-based model is not a single global equation.
- Give one retail feature that might create a useful split.
- Explain why a single tree can overfit.