Nonparametric Regression

Definition

Nonparametric regression estimates the relationship between input variables and a target variable without assuming a fixed global mathematical form such as a line or polynomial.

The main target is often the conditional mean:

\[m(x) = E[Y \mid X=x]\]

A nonparametric regression method estimates this function directly from data:

\[\hat m(x) \approx E[Y \mid X=x]\]

Main Idea

Instead of assuming a model like:

\[Y = \beta_0 + \beta_1X + \varepsilon\]

nonparametric regression asks:

What did similar observations do in the past?

The prediction is usually based on nearby or similar data points.

Why It Is Called Nonparametric

It does not mean there are no parameters at all.

It means there is no fixed finite-dimensional parameter vector that fully determines the whole shape of the regression function.

Instead, model complexity can grow with the amount of data.

Common Examples

Example: Basket Size Prediction

Let:

$X = k$ be the current observed basket size.
$Y$ be the final basket size.

A nonparametric estimator may predict final basket size by averaging historical baskets that had similar current basket size:

\[\hat m(k) = \frac{1}{|S_k|}\sum_{i \in S_k}Y_i\]

where $S_k$ is the set of similar historical baskets.

Strengths

Flexible.
Can capture nonlinear patterns.
Useful when the true relationship is unknown.
Good exploratory modeling tool.

Weaknesses

Needs more data.
Less interpretable than simple parametric models.
Can be unstable in sparse regions.
Sensitive to the definition of similarity.
May perform badly for rare extreme cases.

Bias-Variance Tradeoff

Nonparametric methods are controlled by smoothing choices.

Too little smoothing:

Low bias.
High variance.
Noisy predictions.

Too much smoothing:

High bias.
Low variance.
Overly flat predictions.

Relation to Data Mining

Nonparametric regression is common in data mining because it can discover patterns without imposing a strict formula beforehand.

For retail data, it can be used for:

Basket size prediction.
Customer return probability.
Demand smoothing.
Recommendation systems.

Exercises

Explain why averaging historical outcomes for similar baskets is nonparametric.
Give one reason nonparametric regression may fail for very large baskets.
Compare parametric and nonparametric regression for basket-size prediction.

Nonparametric Regression

Definition

Main Idea

Why It Is Called Nonparametric

Common Examples

Example: Basket Size Prediction

Strengths

Weaknesses

Bias-Variance Tradeoff

Relation to Data Mining

Exercises

See

Conditional Mean Estimation

Kernel Regression

k-NN Regression

Local Smoothing

Splines

Tree-Based Prediction

Nonparametric Regression

Definition

Main Idea

Why It Is Called Nonparametric

Common Examples

Example: Basket Size Prediction

Strengths

Weaknesses

Bias-Variance Tradeoff

Relation to Data Mining

Exercises

See

Conditional Mean Estimation

Kernel Regression

k-NN Regression

Local Smoothing

Splines

Tree-Based Prediction

Sessions by Day

Productivity by Hour

Session Completion Rate

Time Spent by Task

Sessions by Day of Week

Session Duration Distribution