Regression Trees
Definition
A regression tree is a decision tree used to predict a numerical target variable.
It estimates a regression function:
\[\hat m(x) \approx E[Y \mid X=x]\]by splitting the feature space into regions and predicting the average target value inside each region.
Main Idea
A regression tree partitions the data into regions:
\[R_1, R_2, \ldots, R_M\]For a new observation $x$, the tree finds which region contains $x$ and predicts:
\[\hat m(x) = \frac{1}{|R_m|}\sum_{i \in R_m}Y_i\]where $R_m$ is the leaf region containing $x$.
How It Splits
At each node, the tree searches for a feature $X_j$ and split point $c$:
\[X_j \leq c\]The chosen split is the one that most reduces prediction error.
A common objective is to reduce sum of squared errors:
\[SSE = \sum_{i=1}^{n}(Y_i - \hat Y_i)^2\]Prediction Shape
Regression trees produce piecewise constant predictions.
That means the prediction is flat inside each leaf region.
This is different from linear regression, where prediction changes smoothly with $x$.
Retail Example
Suppose:
- $X$ = current basket size
- $Y$ = final basket size
A regression tree may learn rules such as:
- if current basket size is small, predict a small final basket
- if current basket size is large and the customer is frequent, predict a much larger final basket
Strengths
- Captures nonlinear relationships.
- Captures interactions between variables.
- Easy to explain as rules.
- Works well with mixed feature types.
Weaknesses
- A single tree can overfit.
- Predictions are not smooth.
- Trees can be unstable.
- Rare extreme baskets may be poorly predicted.
Relation to Nonparametric Regression
Regression trees are nonparametric because they do not assume a fixed global equation such as:
\[Y = \beta_0 + \beta_1X + \varepsilon\]Instead, the shape of the model is learned from the data through recursive splits.
Exercises
- Explain why a regression tree predicts an average inside each leaf.
- Why are regression tree predictions piecewise constant?
- In basket-size prediction, why might a regression tree underestimate very large baskets?