Loading graph…

Polynomial Regression

Definition

Polynomial regression is a parametric regression method that models the target variable as a polynomial function of the input variable.

For degree $d$:

\[Y = \beta_0 + \beta_1X + \beta_2X^2 + \cdots + \beta_dX^d + \varepsilon\]

Core Idea

Polynomial regression extends linear regression by adding powers of the input variable.

For example, a quadratic model is:

\[Y = \beta_0 + \beta_1X + \beta_2X^2 + \varepsilon\]

This allows the fitted curve to bend.

Why It Is Still Parametric

Although the curve can be nonlinear, the model still has a fixed number of parameters:

\[\beta_0, \beta_1, \beta_2, \ldots, \beta_d\]

Once the degree $d$ is chosen, the model structure is fixed.

Prediction Function

The fitted model is:

\[\hat Y = \hat\beta_0 + \hat\beta_1X + \hat\beta_2X^2 + \cdots + \hat\beta_dX^d\]

Example

A quadratic model for final basket size could be:

\[\hat Y = \hat\beta_0 + \hat\beta_1k + \hat\beta_2k^2\]

where:

  • $k$ is the current observed basket size.
  • $\hat Y$ is the predicted final basket size.

Degree of the Polynomial

The degree controls model flexibility.

Degree Model Shape
1 Linear Straight line
2 Quadratic One bend
3 Cubic More flexible curve
Higher High-degree polynomial Very flexible, but risky

Overfitting Risk

High-degree polynomial regression can fit training data very closely but behave badly on new data.

This is overfitting.

Signs of overfitting:

  • Very wavy fitted curve.
  • Low training error but high test error.
  • Extreme predictions outside the observed range.

Relation to Linear Regression

Polynomial regression is linear in the parameters even though it is nonlinear in $X$.

For example:

\[Y = \beta_0 + \beta_1X + \beta_2X^2 + \varepsilon\]

is linear in:

\[\beta_0, \beta_1, \beta_2\]

Therefore, it can be fitted using ordinary least squares.

Strengths

  • More flexible than simple linear regression.
  • Still easy to fit.
  • Useful when the relationship has smooth curvature.

Weaknesses

  • Can overfit.
  • Can behave badly at the edges.
  • Sensitive to outliers.
  • Degree choice is subjective unless validated.

Example: Prediction Error Curve

If a basket-size predictor performs well for small and medium baskets but underestimates large baskets, a polynomial model may capture some curvature better than a straight line.

However, if the large baskets belong to a different population, polynomial regression alone may not solve the problem.

Exercises

  1. Fit polynomial regressions of degree 1, 2, and 3.
  2. Compare train error and test error.
  3. Explain why a high-degree polynomial may not be reliable for extreme basket sizes.

See

Parametric Regression

Linear Regression

Local Smoothing

25

25
Ready to start
Polynomial Regression
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions