Logistic Regression
Definition
Logistic regression is a parametric regression method used when the target variable is binary.
It models the probability that an observation belongs to class $1$:
\[P(Y = 1 \mid X = x)\]Instead of predicting $Y$ directly, it predicts a probability between $0$ and $1$.
Model Form
For one input variable:
\[P(Y = 1 \mid X = x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x)}}\]For multiple input variables:
\[P(Y = 1 \mid X = x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + \cdots + \beta_px_p)}}\]Logit Form
Logistic regression can also be written as:
\[\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1x_1 + \cdots + \beta_px_p\]where:
- $p$ is the probability of class $1$.
- $\frac{p}{1-p}$ is the odds.
- $\log\left(\frac{p}{1-p}\right)$ is the log-odds.
Core Idea
Linear regression can predict values below $0$ or above $1$, which is not valid for probabilities.
Logistic regression fixes this by passing the linear score through the logistic function:
\[\sigma(z) = \frac{1}{1 + e^{-z}}\]This maps any real number to the interval $(0, 1)$.
Prediction Rule
The model first predicts a probability:
\[\hat p = P(Y = 1 \mid X = x)\]Then it can classify using a threshold, commonly $0.5$:
\[\hat Y = \begin{cases} 1 & \text{if } \hat p \geq 0.5 \\ 0 & \text{if } \hat p < 0.5 \end{cases}\]Interpretation of Coefficients
A coefficient $\beta_j$ represents the change in log-odds for a one-unit increase in $X_j$.
Exponentiating the coefficient gives the odds ratio:
\[e^{\beta_j}\]If $e^{\beta_j} > 1$, increasing $X_j$ increases the odds of class $1$.
If $e^{\beta_j} < 1$, increasing $X_j$ decreases the odds of class $1$.
Example: Customer Return Prediction
Let:
- $Y = 1$ if a customer returns within 30 days.
- $Y = 0$ otherwise.
- $X$ = features such as basket value, item count, recency, and purchase frequency.
A logistic regression model estimates:
\[P(\text{return within 30 days} \mid X)\]This makes it useful for churn, retention, and repeat-purchase analysis.
Relation to Conditional Mean
For binary $Y$, the conditional mean is equal to the probability of class $1$:
\[E[Y \mid X=x] = P(Y = 1 \mid X=x)\]So logistic regression estimates a conditional mean, but forces it to stay between $0$ and $1$.
Strengths
- Interpretable.
- Good baseline for binary classification.
- Produces probabilities.
- Fast to train.
- Works well when effects are approximately linear on the log-odds scale.
Weaknesses
- Assumes a linear relationship in log-odds.
- Can underfit nonlinear patterns.
- Sensitive to highly correlated predictors.
- Classification depends on the chosen threshold.
Diagnostics
Useful checks include:
- Confusion matrix.
- Accuracy.
- Precision and recall.
- ROC curve.
- AUC.
- Calibration plot.
- Log loss.
Exercises
- Fit a logistic regression model to predict whether a customer returns within 30 days.
- Interpret one coefficient as an odds ratio.
- Explain why logistic regression is more appropriate than linear regression for binary outcomes.