Poisson Regression
Definition
Poisson regression is a parametric regression method used when the target variable is a count.
It is used for outcomes such as:
- Number of purchases.
- Number of items.
- Number of visits.
- Number of events in a time period.
The target variable is usually non-negative:
\[Y \in \{0, 1, 2, 3, \ldots\}\]Model Form
Poisson regression models the expected count:
\[E[Y \mid X=x] = \lambda(x)\]The most common form uses a log link:
\[\log(\lambda(x)) = \beta_0 + \beta_1x_1 + \cdots + \beta_px_p\]Equivalently:
\[\lambda(x) = e^{\beta_0 + \beta_1x_1 + \cdots + \beta_px_p}\]This ensures that the predicted count is always positive.
Core Idea
Linear regression can predict negative counts, which is not valid.
Poisson regression avoids this by modeling the log of the expected count.
The model predicts:
\[\hat Y = \hat \lambda(x)\]where $\hat \lambda(x)$ is the estimated expected count.
Poisson Distribution
Poisson regression assumes that the count variable follows a Poisson distribution conditional on $X$:
\[Y \mid X=x \sim \text{Poisson}(\lambda(x))\]The probability of observing count $y$ is:
\[P(Y = y) = \frac{e^{-\lambda}\lambda^y}{y!}\]Interpretation of Coefficients
A coefficient $\beta_j$ represents the change in the log expected count for a one-unit increase in $X_j$.
Exponentiating gives the multiplicative effect:
\[e^{\beta_j}\]If $e^{\beta_j} = 1.2$, then a one-unit increase in $X_j$ multiplies the expected count by $1.2$.
That means a 20% increase in expected count.
Example: Basket Item Count Prediction
Let:
- $Y$ = final basket item count.
- $X$ = current observed basket size.
A Poisson regression model could estimate:
\[E[\text{final item count} \mid \text{current item count}]\]Because the final basket size is a count, Poisson regression may be more appropriate than ordinary linear regression.
Example: Purchase Frequency
Let:
- $Y$ = number of orders placed by a customer in the next 30 days.
- $X$ = customer history features.
Poisson regression estimates:
\[E[\text{orders in next 30 days} \mid X]\]This makes it useful for demand, frequency, and retention analysis.
Assumptions
The main assumptions are:
- The target is a count.
- Counts are independent given the predictors.
- The log expected count is linear in the predictors.
- The conditional mean and variance are approximately equal:
Overdispersion
Real data often has variance larger than the mean:
\[Var(Y \mid X) > E[Y \mid X]\]This is called overdispersion.
If overdispersion is strong, alternatives include:
- Quasi-Poisson regression.
- Negative binomial regression.
- Zero-inflated models.
Strengths
- Natural for count data.
- Predictions are non-negative.
- Interpretable through multiplicative effects.
- Useful for event rates and purchase counts.
Weaknesses
- Can perform badly with overdispersion.
- Can underfit highly variable retail baskets.
- Assumes a specific count distribution.
- Sensitive to extreme counts.
Diagnostics
Useful checks include:
- Residual deviance.
- Mean vs variance comparison.
- Predicted vs actual counts.
- Overdispersion test.
- Residual plots.
Exercises
- Fit a Poisson regression model for final basket item count.
- Check whether the variance of basket sizes is larger than the mean.
- Explain why overdispersion is likely in retail basket data.