Decision Trees

Definition

A decision tree is a predictive model that uses a sequence of if-then rules to make a prediction.

Each internal node contains a split rule, such as:

\[X_j \leq c\]

Each terminal node, or leaf, contains a prediction.

Main Idea

A decision tree repeatedly divides the data into smaller groups.

At each step, it chooses a split that makes the resulting groups more homogeneous.

For regression, homogeneous means the target values inside each group are close together.

For classification, homogeneous means the class labels inside each group are mostly the same.

Tree Structure

A decision tree contains:

root node
internal decision nodes
branches
terminal leaves

A prediction is made by starting at the root and following rules until a leaf is reached.

Regression and Classification

Decision trees can be used for both:

regression, when $Y$ is numerical
classification, when $Y$ is categorical

A regression tree predicts a number.

A classification tree predicts a class or class probability.

Example Rule

For basket-size prediction, a tree might learn:

\[\text{if current basket size} \leq 5 \Rightarrow \hat y = 8\]

and:

\[\text{if current basket size} > 5 \Rightarrow \hat y = 23\]

Splitting Criterion

For regression, a common criterion is reduction in squared error.

A split is good if it reduces:

\[\sum_{i=1}^{n}(y_i - \hat y)^2\]

inside the resulting groups.

Strengths

Easy to understand.
Handles nonlinear patterns.
Handles feature interactions.
Requires little preprocessing.

Weaknesses

Can overfit easily.
Can be unstable.
Predictions are not smooth.
A single tree is often less accurate than an ensemble.

Exercises

What is a leaf in a decision tree?
Explain how a decision tree makes a prediction.
Why can a decision tree overfit the training data?

Decision Trees

Definition

Main Idea

Tree Structure

Regression and Classification

Example Rule

Splitting Criterion

Strengths

Weaknesses

Exercises

See

Regression Trees

Random Forests

Decision Trees

Definition

Main Idea

Tree Structure

Regression and Classification

Example Rule

Splitting Criterion

Strengths

Weaknesses

Exercises

See

Regression Trees

Random Forests

Sessions by Day

Productivity by Hour

Session Completion Rate

Time Spent by Task

Sessions by Day of Week

Session Duration Distribution