Loading graph…

Bias-Variance Tradeoff

Definition

The bias-variance tradeoff describes two different sources of prediction error.

Bias is error from a model being too simple.

Variance is error from a model being too sensitive to the training data.

Bias

A high-bias model underfits.

It misses real structure in the data.

Example:

\[\hat y = \beta_0 + \beta_1x\]

may be too simple if the true relationship is curved.

Variance

A high-variance model overfits.

It follows noise in the training data too closely and performs badly on new data.

Very deep decision trees often have high variance.

The Tradeoff

More flexible models usually reduce bias but increase variance.

Less flexible models usually reduce variance but increase bias.

The goal is not maximum flexibility.

The goal is good prediction on unseen data.

Regression Example

For basket-size prediction:

  • a straight line may underfit large baskets
  • a very flexible tree may overfit rare strange baskets
  • a random forest or spline may give a better compromise

Relation to Model Choice

Different models sit at different points in the tradeoff:

Model Bias Variance
Linear regression higher lower
Polynomial regression medium medium
Deep regression tree lower higher
Random forest lower lower than one tree
Spline adjustable adjustable

Exercises

  1. What does it mean for a model to underfit?
  2. What does it mean for a model to overfit?
  3. Why can a random forest reduce variance compared with one tree?

See

Regression

Nonparametric Regression

Splines

Random Forests

25

25
Ready to start
Bias-Variance Tradeoff
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions