k-NN Regression

Definition

k-nearest-neighbors regression, or k-NN regression, is a nonparametric method that predicts a target value by averaging the target values of the $k$ most similar observations.

The prediction is:

\[\hat m(x) = \frac{1}{k}\sum_{i \in N_k(x)} y_i\]

where:

$N_k(x)$ is the set of the $k$ nearest observations to $x$.
$y_i$ is the target value of neighbor $i$.

Main Idea

To predict what happens for a new observation, look at the most similar historical observations and average their outcomes.

Distance

The word nearest depends on a distance function.

For one variable:

\[d(x, x_i) = |x - x_i|\]

For multiple variables:

\[d(x, x_i) = \sqrt{\sum_{j=1}^{p}(x_j - x_{ij})^2}\]

This is Euclidean distance.

Basket Size Example

Suppose:

$x$ = current state of a basket.
$y_i$ = final basket size of historical basket $i$.

A simple one-dimensional version uses current item count:

\[x = k\]

The model finds the $k$ historical baskets with current sizes closest to the current basket size and averages their final basket sizes.

Choosing k

The value of $k$ controls smoothness.

Small $k$:

More local.
More flexible.
Higher variance.

Large $k$:

More stable.
Smoother.
Higher bias.

Weighted k-NN

A weighted version gives closer neighbors more importance:

\[\hat m(x) = \frac{\sum_{i \in N_k(x)} w_i y_i}{\sum_{i \in N_k(x)} w_i}\]

where larger weights are assigned to closer points.

Strengths

Simple to understand.
No training phase beyond storing the data.
Can model nonlinear relationships.
Works well when similarity is meaningful.

Weaknesses

Slow prediction on large datasets.
Sensitive to irrelevant features.
Sensitive to feature scaling.
Performs poorly in high dimensions.
Rare extreme cases may still be poorly predicted.

Feature Scaling

k-NN depends on distance, so features must be on comparable scales.

For example, if one feature is measured in euros and another in item counts, standardization may be needed.

Relation to Conditional Mean Estimation

k-NN regression estimates:

\[E[Y \mid X=x]\]

by averaging the observed $Y$ values near $x$.

Exercises

Explain why k-NN regression is nonparametric.
What happens if $k$ is too small?
What happens if $k$ is too large?
For basket prediction, list three possible features for measuring similarity between baskets.

k-NN Regression

Definition

Main Idea

Distance

Basket Size Example

Choosing k

Weighted k-NN

Strengths

Weaknesses

Feature Scaling

Relation to Conditional Mean Estimation

Exercises

See

Nonparametric Regression

Conditional Mean Estimation

Kernel Regression

k-NN Regression

Definition

Main Idea

Distance

Basket Size Example

Choosing k

Weighted k-NN

Strengths

Weaknesses

Feature Scaling

Relation to Conditional Mean Estimation

Exercises

See

Nonparametric Regression

Conditional Mean Estimation

Kernel Regression

Sessions by Day

Productivity by Hour

Session Completion Rate

Time Spent by Task

Sessions by Day of Week

Session Duration Distribution