Expected Value
Definition
The expected value is the long-run average value of a random variable.
For a discrete random variable $X$:
\[E[X] = \sum_x x P(X=x)\]For a continuous random variable $X$ with density $f(x)$:
\[E[X] = \int_{-\infty}^{\infty} x f(x)\,dx\]Interpretation
The expected value is the center of mass of a probability distribution.
It is not necessarily the value that occurs most often, and it does not have to be one of the possible observed values.
Sample Mean as an Estimate
Given observations:
\[x_1, x_2, \dots, x_n\]we estimate the expected value using the sample mean:
\[\bar x = \frac{1}{n}\sum_{i=1}^{n}x_i\]So:
\[\bar x \approx E[X]\]Example
Suppose basket sizes are:
\[2, 3, 5, 6\]Then the sample mean is:
\[\bar x = \frac{2+3+5+6}{4} = 4\]So the estimated expected basket size is $4$.
Relation to Prediction
If we must predict a numerical variable using a single number, the expected value is the best prediction under squared-error loss.
That means if we predict $Y$ using a constant $a$, the value of $a$ that minimizes average squared error is:
\[a = E[Y]\]Relation to Conditional Mean
For prediction with input information $X=x$, we use a conditional expected value:
\[E[Y \mid X=x]\]This is the expected value of $Y$ after observing that $X=x$.
This is the main quantity estimated in Conditional Mean Estimation.
Retail Example
If $Y$ is final basket size, then:
\[E[Y]\]is the average final basket size overall.
But:
\[E[Y \mid X=3]\]is the average final basket size among baskets where the currently observed basket size is $3$.
Exercises
- Explain why expected value is a theoretical population quantity.
- Compute the expected value of $X$ if $P(X=1)=0.2$, $P(X=2)=0.5$, and $P(X=5)=0.3$.
- In basket-size prediction, why is $E[Y \mid X=x]$ more useful than $E[Y]$?