Kernel Regression

Definition

Kernel regression is a nonparametric regression method that predicts a target value by taking a weighted average of nearby observations.

The most common form is the Nadaraya-Watson estimator:

\[\hat m(x) = \frac{\sum_{i=1}^{n} K\left(\frac{x-x_i}{h}\right)y_i}{\sum_{i=1}^{n} K\left(\frac{x-x_i}{h}\right)}\]

Components

$x$ is the input value where we want a prediction.
$x_i$ is a historical input value.
$y_i$ is the observed output for case $i$.
$K$ is the kernel function.
$h$ is the bandwidth.

Main Idea

Observations closer to $x$ receive more weight.

Observations far from $x$ receive less weight.

The prediction is a local weighted average.

Kernel Function

A kernel function controls how similarity decreases with distance.

A common example is the Gaussian kernel:

\[K(u) = e^{-\frac{u^2}{2}}\]

where:

\[u = \frac{x-x_i}{h}\]

Bandwidth

The bandwidth $h$ controls smoothness.

Small $h$:

Uses only very close observations.
More flexible.
More noisy.

Large $h$:

Uses many observations.
Smoother.
More biased.

Basket Size Example

Let:

$x$ = current basket size.
$y_i$ = final basket size of historical basket $i$.

Kernel regression predicts final basket size by giving high weight to historical baskets with similar current size.

For example, if $x=10$, baskets with current size 9, 10, or 11 may receive high weight, while baskets with current size 100 receive very low weight.

Difference From Simple Conditional Mean Estimation

Simple conditional mean estimation may use only exact matches:

\[X_i = x\]

Kernel regression uses approximate matches with weights.

This is useful when exact matches are rare.

Difference From k-NN Regression

Both methods use nearby observations.

Method	Neighborhood size	Weights
Kernel regression	Controlled by bandwidth $h$	Smooth distance weights
k-NN regression	Controlled by number of neighbors $k$	Often equal weights

Strengths

Flexible.
Smooth predictions.
Uses nearby data instead of exact matches only.
Good for one-dimensional or low-dimensional problems.

Weaknesses

Bandwidth choice is important.
Performs poorly in high dimensions.
Can be biased near boundaries.
Sparse regions produce unstable estimates.

Exercises

Explain the role of the kernel function.
Explain the role of the bandwidth $h$.
In basket-size prediction, why might kernel regression be better than exact matching on basket size?

Kernel Regression

Definition

Components

Main Idea

Kernel Function

Bandwidth

Basket Size Example

Difference From Simple Conditional Mean Estimation

Difference From k-NN Regression

Strengths

Weaknesses

Exercises

See

Nonparametric Regression

Conditional Mean Estimation

Local Smoothing

k-NN Regression

Kernel Regression

Definition

Components

Main Idea

Kernel Function

Bandwidth

Basket Size Example

Difference From Simple Conditional Mean Estimation

Difference From k-NN Regression

Strengths

Weaknesses

Exercises

See

Nonparametric Regression

Conditional Mean Estimation

Local Smoothing

k-NN Regression

Sessions by Day

Productivity by Hour

Session Completion Rate

Time Spent by Task

Sessions by Day of Week

Session Duration Distribution