Loading graph…

Conditional Mean Estimation

Definition

Conditional mean estimation is the task of estimating the expected value of a target variable $Y$ given that an input variable $X$ has a particular value.

The population quantity is:

\[m(x) = E[Y \mid X=x]\]

The estimate from data is written as:

\[\hat m(x)\]

Interpretation

The expression:

\[E[Y \mid X=x]\]

means:

the average value of $Y$ among cases where $X=x$.

It is the theoretical best prediction of $Y$ from $X=x$ under squared-error loss.

Simple Estimator

A direct estimator is:

\[\hat m(k) = \frac{1}{|S_k|}\sum_{i \in S_k}Y_i\]

where:

  • $k$ is the observed input value.
  • $S_k$ is the set of observations similar to $k$.
  • $ S_k $ is the number of observations in that set.
  • $Y_i$ is the observed target value for observation $i$.

Basket Size Example

Suppose:

  • $k$ = current number of observed items in a basket.
  • $Y_i$ = final item count of historical basket $i$.
  • $S_k$ = historical baskets that had current size $k$.

Then:

\[\hat m(k) = \frac{1}{|S_k|}\sum_{i \in S_k}Y_i\]

predicts the final basket size by averaging the final basket sizes of similar historical baskets.

Numerical Example

Suppose current basket size is:

\[k = 3\]

Historical baskets with current size 3 ended with:

\[5, 7, 6, 4, 8\]

Then:

\[\hat m(3) = \frac{5+7+6+4+8}{5} = 6\]

So the predicted final basket size is 6.

Why This Is Regression

Regression is the problem of predicting a target variable from input variables.

The conditional mean:

\[E[Y \mid X=x]\]

is the central object of regression.

Many regression models are different ways of estimating this same object.

Parametric Version

A parametric model assumes a form such as:

\[m(x) = \beta_0 + \beta_1x\]

Then the task is to estimate $\beta_0$ and $\beta_1$.

Nonparametric Version

A nonparametric model does not assume a fixed global form.

It estimates $m(x)$ using nearby or similar observations.

Examples:

Sparse Region Problem

If there are few observations near $x$, then $ S_x $ is small.

The estimate becomes unstable.

This is important in basket analysis because very large baskets are rare.

Exercises

  1. Explain the meaning of $E[Y \mid X=x]$ in words.
  2. For basket-size prediction, define $X$, $Y$, and $S_k$.
  3. Why does the estimator become unstable when $ S_k $ is small?

See

Nonparametric Regression

Regression

Expected Value

Conditional Probability

Conditional Expectation

25

25
Ready to start
Conditional Mean Estimation
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions