Econometrics

Part 1: Introduction and Foundations

Overview
Mathematical Prerequisites
Notation Guide

Part 2: Causality and Simple Regression

Causality and the Notion of Ceteris Paribus
Models Linear in Parameters

Part 3: OLS Mechanics

OLS Mechanics and Estimation
- 6.1 Derivation of OLS Estimators
- 6.2 Properties of OLS Residuals
- 6.3 Total Variation Decomposition

Part 4: Statistical Inference

OLS Inference and Random Sampling
- 7.1 Sampling Distribution of OLS Estimators
- 7.2 Hypothesis Testing
- 7.3 Confidence Intervals

Part 5: Multiple Regression

Simple vs Multiple Regression
- 8.1 Omitted Variable Bias
- 8.2 Frisch-Waugh-Lovell Theorem
- 8.3 Control Variables

Part 6: Dummy Variables

Qualitative Information with Dummy Variables
- 9.1 Binary Regressors
- 9.2 Multiple Categories
- 9.3 Interaction Terms

Part 7: Functional Forms

Functional Form and Nonlinear Relationships
- 10.1 Logarithmic Transformations
- 10.2 Polynomial Models
- 10.3 Interpretation of Coefficients

Part 8: Time Series Foundations

Stationarity, Persistence, and Serial Correlation
Regression Analysis with Time Series Data

Part 9: Advanced Topics

Time Series Analysis (Advanced)
Causal Effect with Difference-in-Differences

Part 10: Appendices

Mathematical Appendix
Statistical Tables
Practice Problems and Solutions

Overview

Econometrics is the application of statistical methods to economic data for the purpose of testing hypotheses and forecasting future trends.

Objectives

Understand causal relationships in economic data and the concept of ceteris paribus
Apply regression analysis to real-world economic problems
Interpret regression results correctly and understand their limitations
Test economic hypotheses using appropriate statistical methods
Handle various data types including cross-sectional, time series, and panel data
Recognize and address common econometric problems

Key Themes

Causality vs. Correlation: Understanding when regression coefficients have causal interpretations
Model Specification: Choosing appropriate functional forms and control variables
Statistical Inference: Making valid conclusions from sample data
Practical Application: Connecting theory to real-world economic problems

Mathematical Prerequisites

Linear Algebra

Prerequisites:

Matrix Operations
- Matrix multiplication: If $\mathbf{A}$ is $m \times n$ and $\mathbf{B}$ is $n \times p$, then $\mathbf{AB}$ is $m \times p$
- Matrix transpose: $(\mathbf{AB})' = \mathbf{B}'\mathbf{A}'$
- Matrix inverse: $\mathbf{AA}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$
Vector Operations
- Inner product: $\mathbf{x}'\mathbf{y} = \sum_{i=1}^n x_i y_i$
- Outer product: $\mathbf{xy}'$ produces an $n \times n$ matrix

Calculus

Essential calculus concepts include:

Differentiation
- Partial derivatives: $\frac{\partial f(x,y)}{\partial x}$
- Chain rule: $\frac{d}{dx}f(g(x)) = f'(g(x))g'(x)$
Optimization
- First-order conditions: $\frac{\partial f}{\partial x} = 0$
- Second-order conditions for minima/maxima

Probability and Statistics

Core statistical concepts:

Random Variables
- Expected value: $E[X] = \int x f(x) dx$ (continuous) or $\sum x P(X=x)$ (discrete)
- Variance: $\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - [E[X]]^2$
- Covariance: $\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])]$
Distributions
- Normal distribution: $X \sim N(\mu, \sigma^2)$
- t-distribution with $n$ degrees of freedom: $t_n$
- Chi-squared distribution: $\chi^2_n$
- F-distribution: $F_{n_1,n_2}$
Estimation Theory
- Unbiasedness: $E[\hat{\theta}] = \theta$
- Consistency: $\text{plim}_{n \to \infty} \hat{\theta} = \theta$
- Efficiency: minimum variance among unbiased estimators

Notation Guide

General Notation

$Y$: Dependent variable (outcome variable)
$X$: Independent variable (explanatory variable, regressor)
$\beta$: Population parameter (true coefficient)
$\hat{\beta}$: Estimated coefficient
$\varepsilon$ or $u$: Error term (disturbance term)
$\hat{u}$ or $\hat{\varepsilon}$: Residual (estimated error)
$n$: Sample size
$k$: Number of regressors (including constant)

Subscripts and Superscripts

$i$: Index for individual observations $(i = 1, 2, ..., n)$
$t$: Index for time periods (in time series)
$j$: Index for variables $(j = 1, 2, ..., k)$

Matrix Notation

$\mathbf{y}$: $n \times 1$ vector of dependent variable observations
$\mathbf{X}$: $n \times k$ matrix of independent variables
$\boldsymbol{\beta}$: $k \times 1$ vector of parameters
$\mathbf{u}$: $n \times 1$ vector of errors
$\hat{\mathbf{u}}$: $n \times 1$ vector of residuals

Common Expressions

Simple Regression Model: $Y_i = \beta_0 + \beta_1 X_i + u_i$
Multiple Regression Model: $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + ... + \beta_k X_{ki} + u_i$
Matrix Form: $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}$
OLS Estimator: $\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$
Variance of OLS Estimator: $\text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1}$

Statistical Operators

$E[\cdot]$: Expected value operator
$\text{Var}(\cdot)$: Variance operator
$\text{Cov}(\cdot, \cdot)$: Covariance operator
$\text{Corr}(\cdot, \cdot)$: Correlation operator
$\text{plim}$: Probability limit
$\stackrel{p}{\rightarrow}$: Converges in probability
$\stackrel{d}{\rightarrow}$: Converges in distribution

Test Statistics

t-statistic: $t = \frac{\hat{\beta}*j - \beta*{j,0}}{\text{SE}(\hat{\beta}_j)}$
F-statistic: $F = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-k-1)}$
R-squared: $R^2 = 1 - \frac{SSR}{TSS} = \frac{ESS}{TSS}$

Currency Notation

When referring to monetary values, we use escaped dollar signs: $100,$50, etc., to distinguish from LaTeX delimiters.

Greek Letters Used

$\alpha$ (alpha): Often used for intercept
$\beta$ (beta): Regression coefficients
$\gamma$ (gamma): Alternative coefficients
$\delta$ (delta): Difference or change
$\varepsilon$ (epsilon): Error term
$\theta$ (theta): General parameter
$\lambda$ (lambda): Eigenvalue or Lagrange multiplier
$\mu$ (mu): Population mean
$\sigma$ (sigma): Standard deviation
$\rho$ (rho): Correlation coefficient
$\tau$ (tau): Treatment effect
$\phi$ (phi): Autoregressive parameter
$\chi$ (chi): Chi-squared distribution
$\omega$ (omega): Alternative error term

Abbreviations

OLS: Ordinary Least Squares
MLE: Maximum Likelihood Estimation
IV: Instrumental Variables
2SLS: Two-Stage Least Squares
GMM: Generalized Method of Moments
BLUE: Best Linear Unbiased Estimator
CLT: Central Limit Theorem
LLN: Law of Large Numbers
i.i.d.: Independent and Identically Distributed
MSE: Mean Squared Error
SSR: Sum of Squared Residuals
ESS: Explained Sum of Squares
TSS: Total Sum of Squares
DF: Degrees of Freedom
SE: Standard Error
CI: Confidence Interval
DGP: Data Generating Process

Part 2: Causality and Simple Regression

Chapter 1: Causality and the Notion of Ceteris Paribus

1.1 Introduction to Causal Analysis

Econometrics is fundamentally about understanding causal relationships. When we ask questions like “What is the effect of education on wages?” or “How does class size affect student performance?”, we are seeking causal answers, not mere correlations.

The Fundamental Problem of Causal Inference

The challenge in causal inference is that we can never observe the same unit under both treatment and control conditions simultaneously. This is known as the fundamental problem of causal inference.

For individual $i$:

$Y_i(1)$: Potential outcome if treated
$Y_i(0)$: Potential outcome if not treated
Causal effect: $\tau_i = Y_i(1) - Y_i(0)$

We only observe one of these potential outcomes, never both.

1.2 The Ceteris Paribus Condition

Ceteris paribus is a Latin phrase meaning “all other things being equal” or “holding other things constant.” This concept is central to causal inference in econometrics.

Definition

The ceteris paribus effect of $X$ on $Y$ is the change in $Y$ resulting from a one-unit change in $X$, holding all other relevant factors constant.

Mathematically, if $Y = f(X, Z)$ where $Z$ represents all other factors: $\frac{\partial Y}{\partial X} \bigg\|_{Z=\bar{Z}}$

Example: Returns to Education

Consider the wage equation: $\text{wage} = f(\text{education}, \text{ability}, \text{experience}, \text{family background}, ...)$

The ceteris paribus effect of education on wages is: $\frac{\partial \text{wage}}{\partial \text{education}} \bigg\|_{\text{other factors constant}}$

1.3 Experimental vs. Observational Data

Randomized Controlled Experiments

In an ideal randomized experiment:

Subjects are randomly assigned to treatment and control groups
Random assignment ensures treatment is independent of potential outcomes
The only systematic difference between groups is the treatment

Under random assignment: $E[Y_i(1)\|D_i=1] - E[Y_i(0)\|D_i=0] = E[Y_i(1) - Y_i(0)] = \text{Average Treatment Effect}$

where $D_i$ is the treatment indicator.

Observational Data Challenges

With observational data:

Treatment is not randomly assigned
Treatment may be correlated with other factors affecting the outcome
Simple comparisons may not yield causal effects

1.4 The Role of Regression in Causal Analysis

Regression analysis can help approximate ceteris paribus effects when:

We can measure and control for confounding factors
The relationship is correctly specified
Key assumptions are satisfied

The population regression function: $E[Y\|X] = \beta_0 + \beta_1 X$

Under certain conditions, $\beta_1$ represents the ceteris paribus effect of $X$ on $Y$.

1.5 Conditions for Causal Interpretation

For regression coefficients to have causal interpretations, we need:

Zero Conditional Mean Assumption: $E[u\|X] = 0$
- The error term is uncorrelated with the regressor
- No omitted variables correlated with $X$
No Perfect Multicollinearity
- Regressors are not perfectly correlated
Variation in the Treatment Variable
- \[\text{Var}(X) > 0\]
Correct Functional Form
- The relationship is correctly specified

Chapter 2: Models Linear in Parameters

2.1 The Simple Linear Regression Model

The simple linear regression model is: $Y_i = \beta_0 + \beta_1 X_i + u_i$

where:

$Y_i$: Dependent variable for observation $i$
$X_i$: Independent variable for observation $i$
$\beta_0$: Intercept parameter
$\beta_1$: Slope parameter
$u_i$: Error term for observation $i$

Interpretation of Parameters

Intercept ($\beta_0$): Expected value of $Y$ when $X = 0$ $E[Y\|X=0] = \beta_0$
Slope ($\beta_1$): Change in expected value of $Y$ for a one-unit change in $X$ $\beta_1 = \frac{\partial E[Y\|X]}{\partial X}$

2.2 Classical Linear Model Assumptions

The classical linear model requires several assumptions:

Assumption 1: Linearity in Parameters

The model is linear in parameters (but not necessarily in variables): $Y_i = \beta_0 + \beta_1 X_i + u_i$

Assumption 2: Random Sampling

We have a random sample of $n$ observations ${(X_i, Y_i): i = 1, ..., n}$ from the population.

Assumption 3: Sample Variation in $X$

The sample variance of $X$ is positive: $\sum_{i=1}^n (X_i - \bar{X})^2 > 0$

Assumption 4: Zero Conditional Mean

\[E[u_i\|X_i] = 0\]

This implies:

\[E[u_i] = 0\]
\[\text{Cov}(X_i, u_i) = 0\]
\[E[Y_i\|X_i] = \beta_0 + \beta_1 X_i\]

Assumption 5: Homoskedasticity (for inference)

\[\text{Var}(u_i\|X_i) = \sigma^2\]

The error variance is constant across all values of $X$.

2.3 The Population Regression Function

The population regression function (PRF) is: $E[Y\|X] = \beta_0 + \beta_1 X$

This represents the expected value of $Y$ given $X$ in the population.

Properties of the PRF

It’s the best linear predictor of $Y$ given $X$
It minimizes the expected squared prediction error
The error term $u$ has zero mean conditional on $X$

2.4 The Sample Regression Function

The sample regression function (SRF) is: $\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i$

where $\hat{\beta}_0$ and $\hat{\beta}_1$ are estimates of the population parameters.

Residuals

The residual for observation $i$ is: $\hat{u}_i = Y_i - \hat{Y}_i = Y_i - (\hat{\beta}_0 + \hat{\beta}_1 X_i)$

2.5 Examples of Linear Models

Example 1: Wage-Education Relationship

\[\text{wage}_i = \beta_0 + \beta_1 \text{educ}_i + u_i\]

Interpretation: $\beta_1$ is the expected change in wages for an additional year of education.

Example 2: Consumption Function

\[\text{consumption}_i = \beta_0 + \beta_1 \text{income}_i + u_i\]

Interpretation: $\beta_1$ is the marginal propensity to consume.

Example 3: Production Function (log-linear)

\[\log(Q_i) = \beta_0 + \beta_1 \log(L_i) + u_i\]

Interpretation: $\beta_1$ is the elasticity of output with respect to labor.

2.6 Sources of Error Terms

The error term $u_i$ captures:

Omitted variables: Factors affecting $Y$ but not included in the model
Measurement error: Inaccuracies in measuring $Y$ or $X$
Functional form misspecification: True relationship is not linear
Random variation: Inherent unpredictability in human behavior

2.7 When Linearity Fails

The linearity assumption may be violated when:

The true relationship is nonlinear
There are interaction effects
The effect of $X$ on $Y$ depends on the level of $X$

Solutions include:

Logarithmic transformations
Polynomial terms
Interaction terms
Piecewise linear models

2.8 Sample Problems

Problem 1

Consider the model: $Y_i = \beta_0 + \beta_1 X_i + u_i$

Given: $E[u_i\|X_i] = 0$ and $\text{Var}(u_i\|X_i) = \sigma^2$

a) Show that $E[Y_i\|X_i] = \beta_0 + \beta_1 X_i$ b) Derive $\text{Var}(Y_i\|X_i)$

Solution: a) $E[Y_i\|X_i] = E[\beta_0 + \beta_1 X_i + u_i\|X_i] = \beta_0 + \beta_1 X_i + E[u_i\|X_i] = \beta_0 + \beta_1 X_i$

b) $\text{Var}(Y_i\|X_i) = \text{Var}(\beta_0 + \beta_1 X_i + u_i\|X_i) = \text{Var}(u_i\|X_i) = \sigma^2$

Problem 2

In a wage regression: $\text{wage}_i = 600 + 80 \times \text{educ}_i + u_i$

Interpret the coefficients.

Solution:

$\hat{\beta}_0 = 600$: Expected wage for someone with zero years of education
$\hat{\beta}_1 = 80$: Each additional year of education is associated with an $$80 increase in wages

2.9 Key Takeaways

Ceteris paribus is essential for causal interpretation
Linear models are linear in parameters, not necessarily in variables
The zero conditional mean assumption is crucial for unbiased estimation
The error term captures all factors affecting $Y$ not included in the model
Proper interpretation of coefficients depends on the functional form

2.10 Practice Questions

What is the difference between correlation and causation in the context of regression analysis?
Consider the model $Y_i = \beta_0 + \beta_1 X_i + u_i$. Under what conditions does $\beta_1$ have a causal interpretation?
Explain why $E[u_i\|X_i] = 0$ is a stronger assumption than $E[u_i] = 0$.
If the true relationship is $Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + u_i$ but we estimate $Y_i = \alpha_0 + \alpha_1 X_i + v_i$, will $\hat{\alpha}_1$ be an unbiased estimator of $\beta_1$? Explain.

Part 3: OLS Mechanics

Chapter 3: OLS Mechanics and Estimation

3.1 Introduction to OLS

Ordinary Least Squares (OLS) is the most widely used estimation method in econometrics. It provides a way to estimate the parameters of a linear regression model by minimizing the sum of squared residuals.

The OLS Criterion

For the model $Y_i = \beta_0 + \beta_1 X_i + u_i$, OLS chooses $\hat{\beta}_0$ and $\hat{\beta}_1$ to minimize:

\[S(\beta_0, \beta_1) = \sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i)^2\]

3.2 Derivation of OLS Estimators

Simple Regression Case

To find the OLS estimators, we take the first-order conditions:

\[\frac{\partial S}{\partial \beta_0} = -2\sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i) = 0\] \[\frac{\partial S}{\partial \beta_1} = -2\sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i)X_i = 0\]

From the first equation: $\sum_{i=1}^n Y_i = n\beta_0 + \beta_1\sum_{i=1}^n X_i$

This gives us: $\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}$

From the second equation and substituting $\hat{\beta}_0$: $\hat{\beta}_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}$

Alternative Forms of the Slope Estimator

Covariance form: $\hat{\beta}_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2}$
Deviation form: $\hat{\beta}_1 = \frac{\sum_{i=1}^n (Y_i - \bar{Y})X_i}{\sum_{i=1}^n (X_i - \bar{X})X_i}$
Original form: $\hat{\beta}_1 = \frac{n\sum_{i=1}^n X_iY_i - \sum_{i=1}^n X_i\sum_{i=1}^n Y_i}{n\sum_{i=1}^n X_i^2 - (\sum_{i=1}^n X_i)^2}$

3.3 Properties of OLS Residuals

The OLS residuals $\hat{u}_i = Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i$ have several important properties:

Property 1: Sum of Residuals Equals Zero

\[\sum_{i=1}^n \hat{u}_i = 0\]

Proof: This follows directly from the first-order condition: $\frac{\partial S}{\partial \beta_0} = -2\sum_{i=1}^n \hat{u}_i = 0$

Property 2: Sample Covariance Between Regressors and Residuals is Zero

\[\sum_{i=1}^n X_i\hat{u}_i = 0\]

Proof: This follows from the second first-order condition: $\frac{\partial S}{\partial \beta_1} = -2\sum_{i=1}^n X_i\hat{u}_i = 0$

Property 3: Sample Mean of Residuals is Zero

\[\bar{\hat{u}} = \frac{1}{n}\sum_{i=1}^n \hat{u}_i = 0\]

Property 4: Predicted Values and Residuals are Uncorrelated

\[\sum_{i=1}^n \hat{Y}_i\hat{u}_i = 0\]

Proof: $\sum_{i=1}^n \hat{Y}_i\hat{u}*i = \sum*{i=1}^n (\hat{\beta}_0 + \hat{\beta}_1 X_i)\hat{u}_i = \hat{\beta}*0\sum*{i=1}^n \hat{u}_i + \hat{\beta}*1\sum*{i=1}^n X_i\hat{u}_i = 0$

Property 5: The Regression Line Passes Through the Point of Averages

\[\bar{Y} = \hat{\beta}_0 + \hat{\beta}_1\bar{X}\]

3.4 Total Variation Decomposition

One of the most important results in regression analysis is the decomposition of total variation:

\[\sum_{i=1}^n (Y_i - \bar{Y})^2 = \sum_{i=1}^n (\hat{Y}*i - \bar{Y})^2 + \sum*{i=1}^n \hat{u}_i^2\]

Or in terms of sum of squares: $\text{TSS} = \text{ESS} + \text{SSR}$

where:

TSS (Total Sum of Squares): $\sum_{i=1}^n (Y_i - \bar{Y})^2$
ESS (Explained Sum of Squares): $\sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2$
SSR (Sum of Squared Residuals): $\sum_{i=1}^n \hat{u}_i^2$

Proof of Decomposition

Start with: $Y_i - \bar{Y} = (\hat{Y}_i - \bar{Y}) + \hat{u}_i$

Square both sides and sum: $\sum_{i=1}^n (Y_i - \bar{Y})^2 = \sum_{i=1}^n [(\hat{Y}_i - \bar{Y}) + \hat{u}_i]^2$

Expanding: $= \sum_{i=1}^n (\hat{Y}*i - \bar{Y})^2 + 2\sum*{i=1}^n (\hat{Y}_i - \bar{Y})\hat{u}*i + \sum*{i=1}^n \hat{u}_i^2$

The middle term equals zero: $\sum_{i=1}^n (\hat{Y}_i - \bar{Y})\hat{u}*i = \sum*{i=1}^n \hat{Y}_i\hat{u}*i - \bar{Y}\sum*{i=1}^n \hat{u}_i = 0 - 0 = 0$

3.5 Goodness of Fit: R-squared

The coefficient of determination (R-squared) measures the proportion of variation in $Y$ explained by $X$:

\[R^2 = \frac{\text{ESS}}{\text{TSS}} = 1 - \frac{\text{SSR}}{\text{TSS}}\]

Properties of $R^2$:

\[0 \leq R^2 \leq 1\]
$R^2 = 1$ implies perfect fit (all residuals are zero)
$R^2 = 0$ implies no linear relationship
In simple regression, $R^2 = r_{XY}^2$ (squared correlation coefficient)

3.6 Matrix Notation for OLS

For the multiple regression case with $k$ regressors: $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}$

The OLS estimator in matrix form: $\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$

Derivation

Minimize: $S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})$

First-order condition: $\frac{\partial S}{\partial \boldsymbol{\beta}} = -2\mathbf{X}'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) = 0$

Solving: $\mathbf{X}'\mathbf{y} = \mathbf{X}'\mathbf{X}\boldsymbol{\beta}$ $\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$

3.7 Special Cases and Applications

Case 1: Regression Through the Origin

If we force $\beta_0 = 0$: $Y_i = \beta_1 X_i + u_i$

The OLS estimator becomes: $\hat{\beta}*1 = \frac{\sum*{i=1}^n X_iY_i}{\sum_{i=1}^n X_i^2}$

Case 2: Regression on a Constant

If $X_i = 1$ for all $i$: $Y_i = \beta_0 + u_i$

The OLS estimator: $\hat{\beta}_0 = \bar{Y}$

Case 3: Regression with a Dummy Variable

Consider: $Y_i = \alpha + \beta D_i + u_i$ where $D_i \in {0,1}$

The OLS estimates:

$\hat{\alpha}$: mean of $Y$ for $D_i = 0$
$\hat{\alpha} + \hat{\beta}$: mean of $Y$ for $D_i = 1$
$\hat{\beta}$: difference in means between groups

3.8 Numerical Example

Given data:

X: 1, 2, 3, 4, 5
Y: 2, 4, 5, 4, 5

Calculate OLS estimates:

$\bar{X} = 3$, $\bar{Y} = 4$
\[\sum_{i=1}^5 (X_i - \bar{X})(Y_i - \bar{Y}) = (-2)(-2) + (-1)(0) + (0)(1) + (1)(0) + (2)(1) = 6\]
\[\sum_{i=1}^5 (X_i - \bar{X})^2 = 4 + 1 + 0 + 1 + 4 = 10\]
\[\hat{\beta}_1 = \frac{6}{10} = 0.6\]
\[\hat{\beta}_0 = 4 - 0.6(3) = 2.2\]

Regression equation: $\hat{Y} = 2.2 + 0.6X$

3.9 Frisch-Waugh-Lovell Theorem

The FWL theorem provides insight into multiple regression by showing how to obtain coefficients through a series of simple regressions.

For the model: $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i$

To find $\hat{\beta}_1$:

Regress $Y$ on $X_2$ and obtain residuals $\tilde{Y}$
Regress $X_1$ on $X_2$ and obtain residuals $\tilde{X}_1$
Regress $\tilde{Y}$ on $\tilde{X}_1$

The coefficient from step 3 equals $\hat{\beta}_1$ from the multiple regression.

3.10 Mean-Centered Regression

Consider the mean-centered model: $(Y_i - \bar{Y}) = \beta_1(X_i - \bar{X}) + v_i$

Properties:

The intercept is zero
The slope coefficient is identical to the original model
$R^2$ is the same as the original model
Useful for focusing on the relationship between deviations

3.11 Partitioned Regression

For the partitioned model: $\mathbf{y} = \mathbf{X}_1\boldsymbol{\beta}_1 + \mathbf{X}_2\boldsymbol{\beta}_2 + \mathbf{u}$

The OLS estimator for $\boldsymbol{\beta}_1$: $\hat{\boldsymbol{\beta}}_1 = [\mathbf{X}_1'(\mathbf{I} - \mathbf{P}_2)\mathbf{X}_1]^{-1}\mathbf{X}_1'(\mathbf{I} - \mathbf{P}_2)\mathbf{y}$

where $\mathbf{P}_2 = \mathbf{X}_2(\mathbf{X}_2'\mathbf{X}_2)^{-1}\mathbf{X}_2'$ is the projection matrix for $\mathbf{X}_2$.

3.12 Practice Problems

Problem 1

Show that the OLS residuals are orthogonal to the fitted values.

Solution: We need to show $\sum_{i=1}^n \hat{Y}_i\hat{u}_i = 0$

$\sum_{i=1}^n \hat{Y}_i\hat{u}*i = \sum*{i=1}^n (\hat{\beta}_0 + \hat{\beta}_1 X_i)\hat{u}_i$ $= \hat{\beta}*0\sum*{i=1}^n \hat{u}_i + \hat{\beta}*1\sum*{i=1}^n X_i\hat{u}_i$ $= \hat{\beta}_0 \cdot 0 + \hat{\beta}_1 \cdot 0 = 0$

Problem 2

Prove that $R^2$ equals the squared correlation between $Y$ and $\hat{Y}$.

Solution: The correlation between $Y$ and $\hat{Y}$ is: $r_{Y,\hat{Y}} = \frac{\text{Cov}(Y,\hat{Y})}{\sqrt{\text{Var}(Y)\text{Var}(\hat{Y})}}$

After some algebra (using the fact that $\hat{Y} = \hat{\beta}_0 + \hat{\beta}*1 X$): $r*{Y,\hat{Y}}^2 = \frac{\text{ESS}}{\text{TSS}} = R^2$

3.13 Key Takeaways

OLS minimizes the sum of squared residuals
OLS residuals have specific algebraic properties that always hold
Total variation can be decomposed into explained and unexplained parts
R-squared measures goodness of fit but not causal validity
The FWL theorem shows the relationship between simple and multiple regression
Matrix notation provides a compact way to express OLS results

Part 4: Statistical Inference

Chapter 4: OLS Inference and Random Sampling

4.1 Statistical Properties of OLS Estimators

Under the classical assumptions, OLS estimators have several desirable properties.

Finite Sample Properties

Unbiasedness: $E[\hat{\beta}_j] = \beta_j$
Efficiency: OLS has the smallest variance among linear unbiased estimators (Gauss-Markov theorem)
Consistency: $\text{plim}_{n \to \infty} \hat{\beta}_j = \beta_j$

4.2 The Gauss-Markov Theorem

Theorem: Under assumptions MLR.1-MLR.5, the OLS estimator $\hat{\boldsymbol{\beta}}$ is the Best Linear Unbiased Estimator (BLUE).

Assumptions for Gauss-Markov:

Linear in parameters: $Y_i = \beta_0 + \beta_1 X_{1i} + ... + \beta_k X_{ki} + u_i$
Random sampling
No perfect collinearity
Zero conditional mean: $E[u_i\|\mathbf{X}] = 0$
Homoskedasticity: $\text{Var}(u_i\|\mathbf{X}) = \sigma^2$

4.3 Sampling Distribution of OLS Estimators

For Simple Regression

The OLS slope estimator can be written as: $\hat{\beta}*1 = \beta_1 + \frac{\sum*{i=1}^n (X_i - \bar{X})u_i}{\sum_{i=1}^n (X_i - \bar{X})^2}$

This shows that $\hat{\beta}_1$ is a linear combination of the error terms.

Variance of OLS Estimators

For the simple regression slope: $\text{Var}(\hat{\beta}*1) = \frac{\sigma^2}{\sum*{i=1}^n (X_i - \bar{X})^2} = \frac{\sigma^2}{n \cdot \text{Var}(X)}$

For the intercept: $\text{Var}(\hat{\beta}*0) = \sigma^2 \left[\frac{1}{n} + \frac{\bar{X}^2}{\sum*{i=1}^n (X_i - \bar{X})^2}\right]$

4.4 Standard Errors

Since $\sigma^2$ is unknown, we estimate it using: $\hat{\sigma}^2 = \frac{1}{n-k-1}\sum_{i=1}^n \hat{u}_i^2 = \frac{\text{SSR}}{n-k-1}$

The standard error of $\hat{\beta}_j$ is: $\text{SE}(\hat{\beta}_j) = \sqrt{\hat{\text{Var}}(\hat{\beta}_j)}$

For simple regression: $\text{SE}(\hat{\beta}*1) = \frac{\hat{\sigma}}{\sqrt{\sum*{i=1}^n (X_i - \bar{X})^2}}$

4.5 Hypothesis Testing

The t-statistic

To test $H_0: \beta_j = \beta_{j,0}$ vs $H_1: \beta_j \neq \beta_{j,0}$:

\[t = \frac{\hat{\beta}*j - \beta*{j,0}}{\text{SE}(\hat{\beta}_j)}\]

Under $H_0$ and the classical assumptions, $t \sim t_{n-k-1}$.

Common Hypothesis Tests

Testing Significance: $H_0: \beta_j = 0$ $t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}$
One-sided Tests:
- $H_0: \beta_j \leq 0$ vs $H_1: \beta_j > 0$
- Reject if $t > t_{\alpha,n-k-1}$
Two-sided Tests:
- $H_0: \beta_j = 0$ vs $H_1: \beta_j \neq 0$
- Reject if $\|t\| > t_{\alpha/2,n-k-1}$

4.6 Confidence Intervals

A $(1-\alpha)100%$ confidence interval for $\beta_j$: $\hat{\beta}*j \pm t*{\alpha/2,n-k-1} \cdot \text{SE}(\hat{\beta}_j)$

Properties:

Contains the true parameter with probability $(1-\alpha)$
Wider intervals indicate more uncertainty
Interval width depends on sample size and error variance

4.7 The F-Test for Joint Hypotheses

To test multiple restrictions simultaneously: $H_0: \beta_1 = 0, \beta_2 = 0, ..., \beta_q = 0$

The F-statistic: $F = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-k-1)}$

where:

$SSR_r$: Sum of squared residuals from restricted model
$SSR_{ur}$: Sum of squared residuals from unrestricted model
$q$: Number of restrictions

Under $H_0$: $F \sim F_{q,n-k-1}$

4.8 Alternative Forms of the F-Test

Using R-squared

\[F = \frac{(R^2_{ur} - R^2_r)/q}{(1-R^2_{ur})/(n-k-1)}\]

For Overall Significance

Testing $H_0: \beta_1 = \beta_2 = ... = \beta_k = 0$: $F = \frac{R^2/k}{(1-R^2)/(n-k-1)}$

4.9 Asymptotic Properties

As $n \to \infty$:

Consistency: $\hat{\beta}_j \stackrel{p}{\rightarrow} \beta_j$
Asymptotic Normality: $\sqrt{n}(\hat{\beta}*j - \beta_j) \stackrel{d}{\rightarrow} N(0, \sigma^2*{\beta_j})$
Asymptotic Efficiency: OLS achieves the Cramér-Rao lower bound

4.10 The Classical Normal Linear Model

Adding the normality assumption: $u_i \sim N(0, \sigma^2)$

Under normality:

OLS estimators are normally distributed in finite samples
$t$-statistics exactly follow $t$-distributions
$F$-statistics exactly follow $F$-distributions
OLS is the maximum likelihood estimator

4.11 Violations of Classical Assumptions

Heteroskedasticity

If $\text{Var}(u_i\|X_i) = \sigma_i^2$:

OLS remains unbiased and consistent
Standard errors are incorrect
Use heteroskedasticity-robust standard errors

Non-normality

OLS remains unbiased
For large samples, inference remains valid (CLT)
For small samples, exact distributions may not hold

4.12 Practical Example

Consider the wage equation: $\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + u_i$

Estimated results: $\widehat{\log(\text{wage})}_i = 0.284 + 0.092 \text{educ}_i + 0.0041 \text{exper}_i$ $\text{SE}: \quad\quad\quad (0.104) \quad (0.007) \quad\quad\quad (0.0017)$ $n = 526, \quad R^2 = 0.316$

Testing if education affects wages: $t = \frac{0.092}{0.007} = 13.14$

Since $\|t\| > 1.96$, reject $H_0: \beta_1 = 0$ at 5% level.

4.13 Sample Problems

Problem 1

Given OLS results: $\hat{Y} = 10 + 2X$ $\text{SE}(\hat{\beta}_0) = 1.5, \quad \text{SE}(\hat{\beta}_1) = 0.5$ $n = 30$

a) Test $H_0: \beta_1 = 0$ at 5% level b) Construct a 95% CI for $\beta_1$

Solution: a) $t = \frac{2-0}{0.5} = 4$. With $df = 28$, $t_{0.025,28} \approx 2.048$. Since $4 > 2.048$, reject $H_0$.

b) $CI: 2 \pm 2.048 \times 0.5 = [0.976, 3.024]$

Problem 2

Test the joint hypothesis $H_0: \beta_1 = \beta_2 = 0$ given:

Unrestricted model: $R^2 = 0.40$, $k = 3$, $n = 50$
Restricted model: $R^2 = 0.30$

Solution: $F = \frac{(0.40 - 0.30)/2}{(1-0.40)/(50-3-1)} = \frac{0.05}{0.013} = 3.85$

With $F_{2,46}$ critical value at 5% ≈ 3.20, reject $H_0$.

4.14 Key Statistical Tables

Critical Values for t-distribution (two-sided)

df	10%	5%	1%
10	1.812	2.228	3.169
20	1.725	2.086	2.845
30	1.697	2.042	2.750
∞	1.645	1.960	2.576

Critical Values for F-distribution (5% level)

df₁\df₂	10	20	30	∞
1	4.96	4.35	4.17	3.84
2	4.10	3.49	3.32	3.00
3	3.71	3.10	2.92	2.60

4.15 Key Takeaways

OLS estimators are unbiased under the zero conditional mean assumption
Standard errors measure the precision of estimates
t-tests are used for individual coefficients
F-tests are used for joint hypotheses
Confidence intervals provide a range of plausible values
Large sample properties rely on the Central Limit Theorem
Violations of classical assumptions affect inference procedures

Part 5: Multiple Regression

Chapter 5: Simple vs Multiple Regression

5.1 Motivation for Multiple Regression

Multiple regression allows us to:

Control for confounding factors
Include multiple explanatory variables
Reduce omitted variable bias
Improve prediction accuracy

The multiple regression model: $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + ... + \beta_k X_{ki} + u_i$

5.2 Interpretation of Coefficients

In multiple regression, $\beta_j$ represents the partial effect of $X_j$ on $Y$, holding all other variables constant: $\beta_j = \frac{\partial E[Y\|X_1,...,X_k]}{\partial X_j}$

This is the ceteris paribus effect we seek for causal inference.

5.3 Omitted Variable Bias

The Omitted Variable Bias Formula

If the true model is: $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i$

But we estimate: $Y_i = \alpha_0 + \alpha_1 X_{1i} + v_i$

Then: $\text{plim}(\hat{\alpha}*1) = \beta_1 + \beta_2 \cdot \delta*{21}$

where $\delta_{21}$ is the coefficient from regressing $X_2$ on $X_1$.

Conditions for Omitted Variable Bias

Bias occurs when:

The omitted variable affects $Y$ ($\beta_2 \neq 0$)
The omitted variable is correlated with included variables ($\delta_{21} \neq 0$)

Direction of bias: $\text{Bias} = \beta_2 \cdot \delta_{21}$

5.4 Example: Wage Equation

Consider: $\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{abil}_i + u_i$

If we omit ability:

$\beta_2 > 0$ (ability increases wages)
$\delta_{21} > 0$ (ability and education are positively correlated)
Therefore: $\hat{\beta}_1$ is upward biased

5.5 The Frisch-Waugh-Lovell Theorem

The FWL theorem shows how multiple regression coefficients can be obtained through a series of simple regressions.

For the model: $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i$

To obtain $\hat{\beta}_1$:

Regress $Y$ on $X_2$ and get residuals $\tilde{Y}$
Regress $X_1$ on $X_2$ and get residuals $\tilde{X}_1$
Regress $\tilde{Y}$ on $\tilde{X}_1$

The coefficient from step 3 equals $\hat{\beta}_1$ from the multiple regression.

Proof Outline

Let $\mathbf{M}_2 = \mathbf{I} - \mathbf{X}_2(\mathbf{X}_2'\mathbf{X}_2)^{-1}\mathbf{X}_2'$

Then: $\hat{\beta}_1 = (\mathbf{X}_1'\mathbf{M}_2\mathbf{X}_1)^{-1}\mathbf{X}_1'\mathbf{M}_2\mathbf{y}$

Since $\mathbf{M}_2\mathbf{X}_1 = \tilde{\mathbf{X}}_1$ and $\mathbf{M}_2\mathbf{y} = \tilde{\mathbf{y}}$: $\hat{\beta}_1 = (\tilde{\mathbf{X}}_1'\tilde{\mathbf{X}}_1)^{-1}\tilde{\mathbf{X}}_1'\tilde{\mathbf{y}}$

5.6 Relationship Between Simple and Multiple Regression

For the model: $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i$

Let:

$\hat{\beta}_1^{simple}$: coefficient from regressing $Y$ on $X_1$ only
$\hat{\beta}_1^{multiple}$: coefficient from multiple regression
$\hat{\delta}_{21}$: coefficient from regressing $X_2$ on $X_1$

Then: $\hat{\beta}_1^{simple} = \hat{\beta}_1^{multiple} + \hat{\beta}*2 \cdot \hat{\delta}*{21}$

5.7 Control Variables

A control variable is included to:

Reduce omitted variable bias
Account for confounding factors
Isolate the effect of interest

Properties of control variables:

May not have causal interpretation
Help achieve conditional independence
Reduce error variance

Example: Returns to Education

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + \beta_3 \text{IQ}_i + u_i\]

Here, experience and IQ are control variables to isolate the effect of education.

5.8 Perfect Multicollinearity

Perfect multicollinearity occurs when one regressor is an exact linear function of others.

Examples:

Including all dummy variables for a categorical variable plus intercept
Including a variable and its perfect linear transformation
The “dummy variable trap”

Consequences:

$(\mathbf{X}'\mathbf{X})$ is not invertible
OLS estimates cannot be computed
One variable must be dropped

5.9 Imperfect Multicollinearity

When regressors are highly (but not perfectly) correlated:

Consequences:

Large standard errors
Sensitive coefficient estimates
Difficulty isolating individual effects
OLS remains unbiased

Detection:

Variance Inflation Factor (VIF): $\text{VIF}_j = \frac{1}{1-R_j^2}$
Correlation matrix
Condition number

5.10 Specification Issues

Including Irrelevant Variables

Effects of including irrelevant variables ($\beta_j = 0$):

OLS remains unbiased
Variance increases
Efficiency loss

Excluding Relevant Variables

Effects of excluding relevant variables ($\beta_j \neq 0$):

OLS is biased (unless uncorrelated)
Inconsistent estimates
Invalid inference

5.11 Practical Example

Consider test scores and class size:

Simple regression: $\widehat{\text{TestScore}} = 698.9 - 2.28 \cdot \text{STR}$ $\text{SE}: \quad\quad\quad\quad (10.4) \quad (0.52)$

Multiple regression: $\widehat{\text{TestScore}} = 686.0 - 1.10 \cdot \text{STR} - 0.65 \cdot \text{PctEL}$ $\text{SE}: \quad\quad\quad\quad (8.7) \quad\quad (0.43) \quad\quad\quad (0.04)$

The coefficient on STR changes substantially when controlling for percent English learners.

5.12 Sample Problems

Problem 1

Given:

Simple regression: $\hat{Y} = 10 + 3X_1$
Multiple regression: $\hat{Y} = 8 + 2X_1 + 4X_2$
Regression of $X_2$ on $X_1$: $\hat{X}_2 = 1 + 0.25X_1$

Verify the relationship between simple and multiple regression coefficients.

Solution: $\hat{\beta}_1^{simple} = \hat{\beta}_1^{multiple} + \hat{\beta}*2 \cdot \hat{\delta}*{21}$ $3 = 2 + 4 \cdot 0.25 = 2 + 1 = 3 \quad \checkmark$

Problem 2

Consider the model: $Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i$

Given:

\[\text{Corr}(X_1, X_2) = 0\]
\[\hat{\beta}_1^{simple} = 5\]

What is $\hat{\beta}_1^{multiple}$?

Solution: Since $\text{Corr}(X_1, X_2) = 0$, we have $\hat{\delta}_{21} = 0$. Therefore: $\hat{\beta}_1^{multiple} = \hat{\beta}_1^{simple} = 5$

5.13 Adjusted R-squared

The adjusted R-squared penalizes for additional regressors: $\bar{R}^2 = 1 - \frac{SSR/(n-k-1)}{TSS/(n-1)} = 1 - \frac{n-1}{n-k-1}(1-R^2)$

Properties:

$\bar{R}^2 < R^2$ (unless $R^2 = 1$)
Can be negative
Doesn’t always increase with additional regressors
Better for model comparison

5.14 Model Selection Criteria

Akaike Information Criterion (AIC)

\[\text{AIC} = n \log\left(\frac{SSR}{n}\right) + 2(k+1)\]

Bayesian Information Criterion (BIC)

\[\text{BIC} = n \log\left(\frac{SSR}{n}\right) + (k+1)\log(n)\]

Lower values indicate better model fit.

5.15 Key Takeaways

Multiple regression isolates partial effects
Omitted variable bias has specific direction and magnitude
The FWL theorem explains the mechanics of multiple regression
Control variables reduce bias but may not have causal interpretation
Perfect multicollinearity prevents estimation
Model selection involves tradeoffs between bias and variance
Adjusted R-squared accounts for model complexity

Part 6: Dummy Variables

Chapter 6: Qualitative Information with Dummy Variables

6.1 Introduction to Dummy Variables

Dummy variables (also called binary or indicator variables) allow us to include qualitative information in regression models.

Definition: A dummy variable takes only two values:

1 if a condition is met
0 otherwise

Examples:

$\text{Female}_i = 1$ if individual $i$ is female, 0 if male
$\text{College}_i = 1$ if individual $i$ has a college degree, 0 otherwise
$\text{Treatment}_i = 1$ if unit $i$ received treatment, 0 if control

6.2 Single Dummy Variable

Consider the model: $Y_i = \beta_0 + \beta_1 D_i + u_i$

where $D_i \in {0,1}$.

Interpretation

When $D_i = 0$: $E[Y_i\|D_i=0] = \beta_0$
When $D_i = 1$: $E[Y_i\|D_i=1] = \beta_0 + \beta_1$
Therefore: $\beta_1 = E[Y_i\|D_i=1] - E[Y_i\|D_i=0]$

$\beta_1$ represents the difference in means between the two groups.

Example: Gender Wage Gap

\[\text{wage}_i = \beta_0 + \beta_1 \text{female}_i + u_i\]

If $\hat{\beta}_1 = -2.50$, female workers earn $$2.50 less per hour on average.

6.3 Dummy Variables with Multiple Categories

For a categorical variable with $m$ categories, use $m-1$ dummy variables.

Example: Education Levels

Categories: High School, Bachelor’s, Master’s, PhD

Define:

$\text{Bachelor}_i = 1$ if highest degree is Bachelor’s
$\text{Master}_i = 1$ if highest degree is Master’s
$\text{PhD}_i = 1$ if highest degree is PhD
High School is the reference category (omitted)

Model: $\text{wage}_i = \beta_0 + \beta_1 \text{Bachelor}_i + \beta_2 \text{Master}_i + \beta_3 \text{PhD}_i + u_i$

Interpretations:

$\beta_0$: average wage for high school graduates
$\beta_1$: wage premium for Bachelor’s vs. high school
$\beta_2$: wage premium for Master’s vs. high school
$\beta_3$: wage premium for PhD vs. high school

6.4 The Dummy Variable Trap

Including all $m$ dummies plus an intercept creates perfect multicollinearity: $\sum_{j=1}^m D_{ji} = 1 \text{ for all } i$

Solutions:

Omit one category (reference category)
Omit the intercept
Use deviation from mean coding

6.5 Interactions with Dummy Variables

Dummy variables can interact with continuous variables to allow different slopes.

Model with Interaction

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_i + \beta_3 (X_i \times D_i) + u_i\]

Interpretation:

When $D_i = 0$: $Y_i = \beta_0 + \beta_1 X_i + u_i$
When $D_i = 1$: $Y_i = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) X_i + u_i$

Effects:

$\beta_2$: difference in intercepts
$\beta_3$: difference in slopes

Example: Returns to Education by Gender

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{female}_i + \beta_3 (\text{educ}_i \times \text{female}_i) + u_i\]

Results: $\widehat{\log(\text{wage})} = 0.389 + 0.082 \cdot \text{educ} - 0.227 \cdot \text{female} + 0.0056 \cdot (\text{educ} \times \text{female})$

For males: $\widehat{\log(\text{wage})} = 0.389 + 0.082 \cdot \text{educ}$ For females: $\widehat{\log(\text{wage})} = 0.162 + 0.0876 \cdot \text{educ}$

6.6 Testing for Group Differences

Testing for Differences in Intercepts

$H_0: \beta_2 = 0$ (no difference in intercepts) Use standard t-test on dummy coefficient.

Testing for Differences in Slopes

$H_0: \beta_3 = 0$ (no difference in slopes) Use standard t-test on interaction coefficient.

Testing for Any Difference

$H_0: \beta_2 = \beta_3 = 0$ (identical regressions) Use F-test for joint significance.

6.7 Chow Test for Structural Break

The Chow test tests whether regression coefficients differ across groups.

For model: $Y_i = \beta_0 + \beta_1 X_i + u_i$

Test whether coefficients differ between groups A and B:

Run pooled regression: get $SSR_p$
Run separate regressions for each group: get $SSR_A$ and $SSR_B$
Compute: $F = \frac{(SSR_p - SSR_A - SSR_B)/k}{(SSR_A + SSR_B)/(n-2k)}$

Under $H_0$: $F \sim F_{k,n-2k}$

6.8 Linear Probability Model

When the dependent variable is binary: $Y_i = \beta_0 + \beta_1 X_{1i} + ... + \beta_k X_{ki} + u_i$

where $Y_i \in {0,1}$.

Properties:

\[E[Y_i\|X_i] = P(Y_i=1\|X_i)\]
Predicted values are probabilities
Coefficients represent changes in probability

Problems:

Predictions can be < 0 or > 1
Heteroskedasticity: $\text{Var}(u_i\|X_i) = P[Y_i=1\|X_i](1-P(Y_i=1\|X_i))$
Non-normal errors

Solutions:

Use robust standard errors
Consider logit or probit models

6.9 Difference-in-Differences with Dummies

Basic DiD setup: $Y_{it} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_i + \beta_3 (\text{Post}_t \times \text{Treat}*i) + u*{it}$

where:

$\text{Post}_t = 1$ for post-treatment period
$\text{Treat}_i = 1$ for treatment group
$\beta_3$ is the DiD estimate

6.10 Practical Applications

Application 1: Wage Discrimination

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{female}_i + \beta_2 \text{educ}_i + \beta_3 \text{exper}_i + u_i\]

Testing for discrimination after controlling for human capital.

Application 2: Program Evaluation

\[\text{outcome}_i = \beta_0 + \beta_1 \text{treatment}_i + \beta_2 \mathbf{X}_i + u_i\]

Estimating treatment effects with controls.

Application 3: Seasonal Effects

\[\text{sales}_t = \beta_0 + \beta_1 \text{Q2}_t + \beta_2 \text{Q3}_t + \beta_3 \text{Q4}_t + u_t\]

Capturing seasonal patterns (Q1 is reference).

6.11 Sample Problems

Problem 1

Consider: $\text{wage}_i = \beta_0 + \beta_1 \text{female}_i + u_i$

Given data:

Males: $n_m = 100$, $\bar{\text{wage}}_m = 20$
Females: $n_f = 80$, $\bar{\text{wage}}_f = 18$

Find OLS estimates.

Solution: $\hat{\beta}_0 = \bar{\text{wage}}_m = 20$ $\hat{\beta}_1 = \bar{\text{wage}}_f - \bar{\text{wage}}_m = 18 - 20 = -2$

Problem 2

Test for different returns to education by gender: $\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{female}_i + \beta_3 (\text{educ}_i \times \text{female}_i) + u_i$

Given: $\hat{\beta}_3 = 0.015$, $\text{SE}(\hat{\beta}_3) = 0.008$

Test $H_0: \beta_3 = 0$ at 5% level.

Solution: $t = \frac{0.015}{0.008} = 1.875$

Since $\|1.875\| < 1.96$, fail to reject $H_0$. No significant difference in returns to education.

6.12 Advanced Topics

Regression Discontinuity with Dummies

\[Y_i = \beta_0 + \beta_1 D_i + \beta_2 (X_i - c) + \beta_3 D_i(X_i - c) + u_i\]

where $D_i = 1$ if $X_i \geq c$.

Triple Interactions

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_{1i} + \beta_3 D_{2i} + \beta_4 (X_i \times D_{1i}) + \beta_5 (X_i \times D_{2i}) + \beta_6 (D_{1i} \times D_{2i}) + \beta_7 (X_i \times D_{1i} \times D_{2i}) + u_i\]

Complex but allows for very flexible specifications.

6.13 STATA Implementation

* Single dummy
reg wage female

* Multiple categories
reg wage i.education

* Interaction
reg wage c.educ##i.female

* Chow test
reg wage educ if female==0
estimates store male
reg wage educ if female==1
estimates store female
reg wage educ
estimates store pooled
lrtest pooled male female

6.14 Key Takeaways

Dummy variables incorporate qualitative information
Always omit one category to avoid the dummy trap
Interactions allow different slopes across groups
Linear probability models have limitations
Chow test formally tests for structural differences
Difference-in-differences uses dummy interactions
Careful interpretation is crucial with multiple dummies

Part 7: Functional Forms

Chapter 7: Functional Form, Nonlinear Relationships, and Interpretation of Coefficients

7.1 Introduction to Functional Forms

Linear regression is “linear in parameters” but can accommodate nonlinear relationships through:

Variable transformations
Polynomial terms
Interaction effects
Piecewise functions

7.2 Logarithmic Transformations

Log-Level Model

\[\log(Y_i) = \beta_0 + \beta_1 X_i + u_i\]

Interpretation: A one-unit change in $X$ is associated with a $100 \cdot \beta_1$ percent change in $Y$.

Exact percentage change: $%\Delta Y = 100 \cdot [e^{\beta_1} - 1]$

For small $\beta_1$ (roughly $\|\beta_1\| < 0.10$): $%\Delta Y \approx 100 \cdot \beta_1$

Level-Log Model

\[Y_i = \beta_0 + \beta_1 \log(X_i) + u_i\]

Interpretation: A 1% increase in $X$ is associated with a $\beta_1/100$ unit change in $Y$.

Log-Log Model (Constant Elasticity)

\[\log(Y_i) = \beta_0 + \beta_1 \log(X_i) + u_i\]

Interpretation: A 1% increase in $X$ is associated with a $\beta_1$% increase in $Y$. $\beta_1$ is the elasticity of $Y$ with respect to $X$.

7.3 Why Use Logarithms?

Percentage interpretation: Natural for many economic variables
Reduce skewness: Log transformation often normalizes right-skewed data
Reduce heteroskedasticity: Variance often proportional to level
Elasticities: Direct interpretation in log-log models
Diminishing returns: Captures concave relationships

7.4 Examples of Log Models

Example 1: Wage Equation (Log-Level)

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + u_i\]

If $\hat{\beta}_1 = 0.08$: One more year of education increases wages by approximately 8%.

Example 2: Production Function (Log-Log)

\[\log(Q_i) = \beta_0 + \beta_1 \log(L_i) + \beta_2 \log(K_i) + u_i\]

If $\hat{\beta}_1 = 0.7$: A 1% increase in labor increases output by 0.7%.

Example 3: Demand Function (Level-Log)

\[Q_i = \beta_0 + \beta_1 \log(P_i) + \beta_2 \log(I_i) + u_i\]

If $\hat{\beta}_1 = -50$: A 1% increase in price reduces quantity demanded by 0.5 units.

7.5 Polynomial Models

Quadratic Model

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + u_i\]

Marginal effect: $\frac{\partial Y}{\partial X} = \beta_1 + 2\beta_2 X$

Properties:

Allows for increasing or decreasing marginal effects
Has a turning point at $X^* = -\frac{\beta_1}{2\beta_2}$
U-shaped if $\beta_2 > 0$, inverse U-shaped if $\beta_2 < 0$

Example: Age-Earnings Profile

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{age}_i + \beta_2 \text{age}_i^2 + u_i\]

Estimated: $\widehat{\log(\text{wage})} = 0.5 + 0.08 \cdot \text{age} - 0.0009 \cdot \text{age}^2$

Peak earnings age: $\text{age}^* = -\frac{0.08}{2(-0.0009)} = 44.4$ years

7.6 Higher-Order Polynomials

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \beta_3 X_i^3 + ... + \beta_p X_i^p + u_i\]

Considerations:

Allows very flexible functional forms
Risk of overfitting
Difficult to interpret beyond quadratic
Can create unrealistic predictions outside sample range

7.7 Interaction Terms

Continuous-Continuous Interaction

\[Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + u_i\]

Marginal effects:

\[\frac{\partial Y}{\partial X_1} = \beta_1 + \beta_3 X_2\]
\[\frac{\partial Y}{\partial X_2} = \beta_2 + \beta_3 X_1\]

The effect of one variable depends on the level of the other.

Example: Education and Experience

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + \beta_3 (\text{educ}_i \times \text{exper}_i) + u_i\]

If $\hat{\beta}_3 > 0$: Returns to education increase with experience.

7.8 Interpreting Coefficients in Nonlinear Models

General Principle

For any model $Y = f(X, \beta)$: $\text{Marginal Effect} = \frac{\partial Y}{\partial X_j} = \frac{\partial f(X, \beta)}{\partial X_j}$

Common Cases

Linear: $Y = \beta_0 + \beta_1 X$
- ME = $\beta_1$ (constant)
Log-Linear: $\log(Y) = \beta_0 + \beta_1 X$
- ME = $\beta_1 \times Y$
- Semi-elasticity = $\beta_1$
Linear-Log: $Y = \beta_0 + \beta_1 \log(X)$
- ME = $\beta_1/X$
- Decreasing marginal effect
Log-Log: $\log(Y) = \beta_0 + \beta_1 \log(X)$
- ME = $\beta_1 \times Y/X$
- Elasticity = $\beta_1$

7.9 Average Partial Effects (APE)

For nonlinear models, marginal effects vary with $X$. The APE summarizes the average effect:

\[\text{APE}*j = \frac{1}{n}\sum*{i=1}^n \frac{\partial \hat{Y}*i}{\partial X*{ji}}\]

Example for quadratic model: $\text{APE} = \hat{\beta}_1 + 2\hat{\beta}_2\bar{X}$

7.10 Testing Functional Form

RESET Test (Regression Specification Error Test)

Estimate original model, obtain $\hat{Y}$
Add powers of $\hat{Y}$ to original model
Test joint significance of added terms

$H_0$: Model correctly specified

Testing Linearity vs. Log

To test whether $Y$ or $\log(Y)$ is appropriate:

Standardize both to have mean 0, variance 1
Run both regressions
Compare R-squared values
Consider economic theory and interpretation

7.11 Practical Examples

Example 1: Housing Prices

\[\log(\text{price}_i) = \beta_0 + \beta_1 \log(\text{sqft}_i) + \beta_2 \text{bedrooms}_i + \beta_3 \text{age}_i + \beta_4 \text{age}_i^2 + u_i\]

Results:

$\hat{\beta}_1 = 0.85$: 1% increase in square footage increases price by 0.85%
$\hat{\beta}_2 = 0.05$: Each additional bedroom increases price by 5%
Age has nonlinear effect with depreciation slowing over time

Example 2: Returns to Scale

\[\log(Q) = \beta_0 + \beta_1 \log(L) + \beta_2 \log(K) + u\]

Testing returns to scale:

Constant returns: $H_0: \beta_1 + \beta_2 = 1$
Increasing returns: $H_0: \beta_1 + \beta_2 > 1$
Decreasing returns: $H_0: \beta_1 + \beta_2 < 1$

7.12 Sample Problems

Problem 1

Given: $\log(\text{wage}_i) = 0.417 + 0.297 \times \text{female}_i + 0.080 \times \text{educ}_i + 0.029 \times \text{exper}_i$

Interpret the coefficient on female.

Solution: Since female is a dummy and wage is in logs: $%\Delta\text{wage} = 100[e^{0.297} - 1] = 34.6%$

Women earn approximately 34.6% more than men, controlling for education and experience.

Problem 2

Consider: $\text{price} = \beta_0 + \beta_1 \text{sqft} + \beta_2 \text{sqft}^2 + u$

Given estimates: $\hat{\beta}_1 = 200$, $\hat{\beta}_2 = -0.05$

a) Find the marginal effect of square footage on price b) At what square footage is price maximized?

Solution: a) $\frac{\partial \text{price}}{\partial \text{sqft}} = 200 - 0.1 \times \text{sqft}$

b) Set marginal effect to zero: $200 - 0.1 \times \text{sqft}^* = 0$ $\text{sqft}^* = 2000$ square feet

7.13 Common Pitfalls

Interpreting log coefficients as percentages: Valid only for small coefficients
Extrapolation: Polynomial models can behave poorly outside sample range
Multicollinearity: Powers and interactions are correlated with base terms
Over-parameterization: Too many polynomial terms
Ignoring economic theory: Functional form should make economic sense

7.14 Choosing Functional Form

Guidelines:

Start with economic theory
Examine data patterns (scatter plots)
Consider variable properties (always positive? percentages?)
Test alternative specifications
Check residual plots
Evaluate out-of-sample predictions

7.15 Key Takeaways

Logarithmic transformations provide percentage interpretations
Different log specifications have different interpretations
Polynomial terms capture nonlinear relationships
Interaction terms allow effects to vary
Marginal effects depend on functional form
Average partial effects summarize varying marginal effects
Functional form choice affects interpretation and inference
Economic theory should guide specification choices

Part 8: Time Series Foundations

Chapter 8: Stationarity, Persistence, and Serial Correlation

8.1 Time Series Data Characteristics

Time series data has unique features:

Temporal ordering matters
Observations are typically dependent
Trends and seasonality are common
Dynamic relationships exist

Notation: $Y_t$ denotes the value of variable $Y$ at time $t$.

8.2 Stationarity

A time series ${Y_t}$ is strictly stationary if the joint distribution of $(Y_t, Y_{t+1}, ..., Y_{t+k})$ is the same as $(Y_{t+h}, Y_{t+h+1}, ..., Y_{t+h+k})$ for all $t$, $k$, and $h$.

A time series is weakly stationary (or covariance stationary) if:

$E[Y_t] = \mu$ (constant mean)
$\text{Var}(Y_t) = \sigma^2$ (constant variance)
$\text{Cov}(Y_t, Y_{t+h}) = \gamma_h$ (covariance depends only on lag $h$)

8.3 Examples of Stationary and Non-stationary Processes

Stationary Process: White Noise

$Y_t = \varepsilon_t$ where $\varepsilon_t \sim \text{i.i.d.}(0, \sigma^2)$

Properties:

\[E[Y_t] = 0\]
\[\text{Var}(Y_t) = \sigma^2\]
$\text{Cov}(Y_t, Y_{t+h}) = 0$ for $h \neq 0$

Non-stationary Process: Random Walk

$Y_t = Y_{t-1} + \varepsilon_t$ where $Y_0 = 0$ and $\varepsilon_t \sim \text{i.i.d.}(0, \sigma^2)$

Properties:

\[E[Y_t] = 0\]
$\text{Var}(Y_t) = t\sigma^2$ (variance increases with time)
Non-stationary due to time-varying variance

8.4 Autocorrelation

The autocorrelation function (ACF) at lag $h$: $\rho_h = \frac{\text{Cov}(Y_t, Y_{t-h})}{\text{Var}(Y_t)} = \frac{\gamma_h}{\gamma_0}$

Sample autocorrelation: $\hat{\rho}*h = \frac{\sum*{t=h+1}^T (Y_t - \bar{Y})(Y_{t-h} - \bar{Y})}{\sum_{t=1}^T (Y_t - \bar{Y})^2}$

8.5 Autoregressive Processes

AR(1) Process

\[Y_t = \phi Y_{t-1} + \varepsilon_t\]

Stationarity condition: $\|\phi\| < 1$

Properties when stationary:

\[E[Y_t] = 0\]
\[\text{Var}(Y_t) = \frac{\sigma^2}{1-\phi^2}\]
\[\rho_h = \phi^h\]

AR(p) Process

\[Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p} + \varepsilon_t\]

Stationarity condition: Roots of characteristic equation lie outside unit circle.

8.6 Serial Correlation in Regression Models

Consider the model: $Y_t = \beta_0 + \beta_1 X_t + u_t$

Serial correlation occurs when: $\text{Cov}(u_t, u_{t-h}) \neq 0 \text{ for some } h \neq 0$

Common form: AR(1) errors $u_t = \rho u_{t-1} + \varepsilon_t$

8.7 Consequences of Serial Correlation

With positive serial correlation ($\rho > 0$):

OLS remains unbiased and consistent
OLS standard errors are biased (usually downward)
t-statistics are inflated
Confidence intervals are too narrow
OLS is inefficient

8.8 Testing for Serial Correlation

Durbin-Watson Test

Test statistic: $DW = \frac{\sum_{t=2}^T (\hat{u}*t - \hat{u}*{t-1})^2}{\sum_{t=1}^T \hat{u}_t^2}$

Approximate relationship: $DW \approx 2(1 - \hat{\rho})$

Decision rules:

$DW \approx 2$: No serial correlation
$DW < 2$: Positive serial correlation
$DW > 2$: Negative serial correlation

Breusch-Godfrey Test

Estimate original model, obtain residuals $\hat{u}_t$
Regress $\hat{u}*t$ on original regressors and $\hat{u}*{t-1}, ..., \hat{u}_{t-p}$
Test joint significance of lagged residuals

$H_0$: No serial correlation up to lag $p$

8.9 Correcting for Serial Correlation

Method 1: HAC Standard Errors

Heteroskedasticity and Autocorrelation Consistent (HAC) standard errors:

Newey-West estimator
Corrects standard errors without changing coefficient estimates

Method 2: GLS/Feasible GLS

If errors follow AR(1):

Estimate $\rho$ from OLS residuals
Transform variables: $Y_t^* = Y_t - \hat{\rho}Y_{t-1}$
Run OLS on transformed variables

Method 3: Include Lagged Variables

Add dynamics to the model: $Y_t = \beta_0 + \beta_1 X_t + \gamma Y_{t-1} + u_t$

8.10 Unit Roots and Non-stationarity

Unit Root Process

\[Y_t = Y_{t-1} + \varepsilon_t\]

Properties:

Shocks have permanent effects
Variance grows without bound
Standard inference procedures invalid

Dickey-Fuller Test

Test $H_0: \phi = 1$ in: $\Delta Y_t = \alpha + \beta t + (\phi - 1)Y_{t-1} + \varepsilon_t$

Variants:

No constant, no trend
Constant, no trend
Constant and trend

Augmented Dickey-Fuller (ADF) Test

Include lagged differences: $\Delta Y_t = \alpha + \beta t + \gamma Y_{t-1} + \sum_{j=1}^p \delta_j \Delta Y_{t-j} + \varepsilon_t$

Test $H_0: \gamma = 0$

8.11 Spurious Regression

When two non-stationary series are regressed: $Y_t = \beta_0 + \beta_1 X_t + u_t$

Problems:

High $R^2$ even if series are independent
Significant t-statistics
Invalid inference

Solution: Check for cointegration or difference the series.

8.12 Practical Example

Consider monthly stock returns: $R_t = \alpha + \beta R_{m,t} + u_t$

Testing for serial correlation:

Estimate model, obtain residuals
Compute DW statistic: $DW = 1.85$
Since $DW \approx 2$, little evidence of serial correlation
Confirm with Breusch-Godfrey test

8.13 Sample Problems

Problem 1

Given AR(1) process: $Y_t = 0.7Y_{t-1} + \varepsilon_t$, where $\varepsilon_t \sim \text{i.i.d.}(0,4)$

a) Is the process stationary? b) Find the unconditional variance c) Find the first-order autocorrelation

Solution: a) Since $\|0.7\| < 1$, the process is stationary b) $\text{Var}(Y_t) = \frac{4}{1-0.7^2} = \frac{4}{0.51} = 7.84$ c) $\rho_1 = 0.7$

Problem 2

Testing for serial correlation yields $DW = 0.8$. What does this suggest?

Solution: Since $DW \approx 2(1-\hat{\rho})$: $0.8 \approx 2(1-\hat{\rho})$ $\hat{\rho} \approx 0.6$

Strong positive serial correlation is present.

Chapter 9: Regression Analysis with Time Series Data

9.1 Time Series Regression Model

General form: $Y_t = \beta_0 + \beta_1 X_{1t} + ... + \beta_k X_{kt} + u_t$

Key assumptions modified for time series:

Linear in parameters
No perfect collinearity
Zero conditional mean: $E[u_t\|X_t, X_{t-1}, ...] = 0$
Homoskedasticity: $\text{Var}(u_t\|X_t, X_{t-1}, ...) = \sigma^2$
No serial correlation: $\text{Cov}(u_t, u_s\|X) = 0$ for $t \neq s$
Normality (for finite sample inference)

9.2 Static and Dynamic Models

Static Model

\[Y_t = \beta_0 + \beta_1 X_t + u_t\]

Contemporaneous relationship only.

Distributed Lag Model

\[Y_t = \alpha + \beta_0 X_t + \beta_1 X_{t-1} + ... + \beta_q X_{t-q} + u_t\]

Effects distributed over time.

Autoregressive Distributed Lag (ARDL) Model

\[Y_t = \alpha + \gamma_1 Y_{t-1} + ... + \gamma_p Y_{t-p} + \beta_0 X_t + ... + \beta_q X_{t-q} + u_t\]

Includes both lagged dependent and independent variables.

9.3 Trends in Time Series

Deterministic Trend

\[Y_t = \beta_0 + \beta_1 t + u_t\]

Linear trend in time.

Stochastic Trend

$Y_t = Y_{t-1} + \beta_0 + \varepsilon_t$ $Y_t = Y_0 + \beta_0 t + \sum_{s=1}^t \varepsilon_s$

Trend with random walk component.

Detrending Methods

Include time trend as regressor
First-difference the data
Use deviation from trend

9.4 Seasonality

Seasonal patterns can be modeled using:

Seasonal dummy variables
Trigonometric functions
Seasonal differencing

Example with quarterly dummies: $Y_t = \beta_0 + \beta_1 Q2_t + \beta_2 Q3_t + \beta_3 Q4_t + \beta_4 X_t + u_t$

9.5 Forecasting

One-step-ahead Forecast

For model $Y_t = \beta_0 + \beta_1 X_t + u_t$: $\hat{Y}_{T+1\|T} = \hat{\beta}_0 + \hat{\beta}*1 X*{T+1}$

Forecast error: $e_{T+1} = Y_{T+1} - \hat{Y}_{T+1\|T}$

Forecast Evaluation

Mean Squared Forecast Error (MSFE): $MSFE = \frac{1}{P}\sum_{t=T+1}^{T+P} (Y_t - \hat{Y}_{t\|t-1})^2$

9.6 Key Takeaways

Stationarity is crucial for valid inference
Serial correlation invalidates standard errors
Unit root processes require special treatment
Spurious regression is a serious concern
Dynamic models capture time dependencies
Trends and seasonality must be addressed
Various tests detect time series problems
Forecasting requires careful model specification

Part 9: Advanced Topics

Chapter 13: Time Series Analysis (Advanced)

13.1 Cointegration

Two non-stationary series $Y_t$ and $X_t$ are cointegrated if:

Both are integrated of order 1: $Y_t \sim I(1)$, $X_t \sim I(1)$
There exists $\beta$ such that $u_t = Y_t - \beta X_t \sim I(0)$

Economic interpretation: Long-run equilibrium relationship exists.

Engle-Granger Two-Step Procedure

Test each series for unit root (both should be $I(1)$)
Estimate cointegrating regression: $Y_t = \alpha + \beta X_t + u_t$
Test residuals for unit root (should be $I(0)$)
If cointegrated, estimate Error Correction Model (ECM)

Error Correction Model (ECM)

\[\Delta Y_t = \alpha + \gamma(Y_{t-1} - \beta X_{t-1}) + \delta \Delta X_t + \varepsilon_t\]

where:

$\gamma$: Speed of adjustment to equilibrium
$(Y_{t-1} - \beta X_{t-1})$: Error correction term

13.2 Vector Autoregression (VAR)

For two variables: $\begin{bmatrix} Y_t \ X_t \end{bmatrix} = \begin{bmatrix} c_1 \ c_2 \end{bmatrix} + \begin{bmatrix} \phi_{11} & \phi_{12} \ \phi_{21} & \phi_{22} \end{bmatrix} \begin{bmatrix} Y_{t-1} \ X_{t-1} \end{bmatrix} + \begin{bmatrix} u_{1t} \ u_{2t} \end{bmatrix}$

Properties:

All variables treated symmetrically
Captures dynamic interrelationships
Used for forecasting and impulse response analysis

13.3 Panel Data Basics

Panel data combines cross-sectional and time series: $Y_{it} = \beta_0 + \beta_1 X_{it} + u_{it}$

where $i$ indexes units and $t$ indexes time.

Types of Panel Data Models

Pooled OLS: Ignores panel structure $Y_{it} = \beta_0 + \beta_1 X_{it} + u_{it}$
Fixed Effects: Unit-specific intercepts $Y_{it} = \alpha_i + \beta_1 X_{it} + u_{it}$
Random Effects: Random intercepts $Y_{it} = \beta_0 + \beta_1 X_{it} + (\alpha_i + u_{it})$

13.4 ARCH and GARCH Models

Autoregressive Conditional Heteroskedasticity (ARCH) models time-varying volatility:

ARCH(1) Model

$Y_t = \mu + \varepsilon_t$ $\varepsilon_t = \sigma_t z_t$ $\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2$

where $z_t \sim \text{i.i.d.}(0,1)$

GARCH(1,1) Model

\[\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2\]

Applications: Financial volatility modeling, risk management

Chapter 14: Causal Effect with Difference-in-Differences

14.1 The DiD Framework

Difference-in-Differences estimates causal effects by comparing changes over time between treatment and control groups.

Basic setup:

Two periods: Pre-treatment ($t=0$) and post-treatment ($t=1$)
Two groups: Treatment ($D_i=1$) and control ($D_i=0$)

14.2 The DiD Estimator

Simple 2×2 DiD

\[\text{DiD} = [E(Y_{i1}\|D_i=1) - E(Y_{i0}\|D_i=1)] - [E(Y_{i1}\|D_i=0) - E(Y_{i0}\|D_i=0)]\]

Or: $\text{DiD} = (\bar{Y}*{1,1} - \bar{Y}*{1,0}) - (\bar{Y}*{0,1} - \bar{Y}*{0,0})$

Regression Specification

\[Y_{it} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_i + \beta_3 (\text{Post}_t \times \text{Treat}*i) + u*{it}\]

where:

$\beta_3$: DiD estimate (treatment effect)
$\beta_1$: Time trend
$\beta_2$: Pre-existing differences

14.3 Key Assumptions

Parallel Trends: In absence of treatment, treatment and control groups would have followed parallel trends $E[Y_{i1}(0) - Y_{i0}(0)\|D_i=1] = E[Y_{i1}(0) - Y_{i0}(0)\|D_i=0]$
No Anticipation: Treatment group doesn’t change behavior before treatment
SUTVA: Stable Unit Treatment Value Assumption (no spillovers)

14.4 Testing Parallel Trends

Pre-treatment trends test: $Y_{it} = \alpha_i + \gamma_t + \sum_{k \neq -1} \delta_k (D_i \times I(t=k)) + \varepsilon_{it}$

Test $H_0: \delta_k = 0$ for all $k < 0$

14.5 DiD with Multiple Time Periods

\[Y_{it} = \alpha_i + \gamma_t + \beta D_{it} + u_{it}\]

where:

$\alpha_i$: Unit fixed effects
$\gamma_t$: Time fixed effects
$D_{it}$: Treatment indicator

14.6 Practical Example: Minimum Wage Study

Card and Krueger (1994) study:

Treatment: New Jersey (raised minimum wage)
Control: Pennsylvania (no change)
Outcome: Employment in fast-food restaurants

Model: $\text{Emp}_{it} = \beta_0 + \beta_1 \text{After}_t + \beta_2 \text{NJ}_i + \beta_3 (\text{After}_t \times \text{NJ}*i) + u*{it}$

Results: $\hat{\beta}_3 > 0$ (employment increased in NJ relative to PA)

14.7 Extensions and Variations

Triple Differences

Add another dimension of comparison: $Y_{igt} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_i + \beta_3 \text{Group}_g + \beta_4 (\text{Post}_t \times \text{Treat}_i) + \beta_5 (\text{Post}_t \times \text{Group}_g) + \beta_6 (\text{Treat}_i \times \text{Group}_g) + \beta_7 (\text{Post}_t \times \text{Treat}_i \times \text{Group}*g) + u*{igt}$

Staggered Treatment

Different units treated at different times: $Y_{it} = \alpha_i + \gamma_t + \sum_k \beta_k \cdot I(\text{time since treatment} = k) + u_{it}$

14.8 Common Pitfalls

Violation of Parallel Trends: Check pre-trends carefully
Treatment Effect Heterogeneity: Effects may vary across units/time
Bad Controls: Don’t control for outcomes of treatment
Spillover Effects: Treatment may affect control units

14.9 Sample Problems

Problem 1

Given DiD data:

Pre-treatment: $\bar{Y}*{T,0} = 50$, $\bar{Y}*{C,0} = 40$
Post-treatment: $\bar{Y}*{T,1} = 70$, $\bar{Y}*{C,1} = 55$

Calculate the DiD estimate.

Solution: $\text{DiD} = (70 - 50) - (55 - 40) = 20 - 15 = 5$

The treatment effect is 5 units.

Problem 2

Test parallel trends given: $Y_{it} = 2 + 0.5t + 3D_i - 0.8(D_i \times I(t=-2)) + 0.2(D_i \times I(t=-1)) + 4(D_i \times I(t=1)) + u_{it}$

Solution: Pre-treatment coefficients: -0.8 (t=-2) and 0.2 (t=-1) Since these are not both zero, parallel trends may be violated.

14.10 Advanced DiD Topics

Synthetic Control Method

When only one treated unit:

Create synthetic control as weighted average of control units
Weights chosen to match pre-treatment characteristics
Compare treated unit to synthetic control

Regression Discontinuity with DiD

Combine RD and DiD for additional identification: $Y_{it} = \beta_0 + \beta_1 f(X_i) + \beta_2 D_i + \beta_3 \text{Post}_t + \beta_4 (D_i \times \text{Post}*t) + u*{it}$

14.11 STATA Implementation

* Basic DiD
gen post = (year >= 2000)
gen treat_post = treat * post
reg outcome post treat treat_post, cluster(state)

* With fixed effects
xtset state year
xtreg outcome treat_post, fe cluster(state)

* Event study
forvalues k = -3/3 {
    gen treat_`k' = treat * (year == 2000 + `k')
}
reg outcome treat_*, cluster(state)

14.12 Key Takeaways

DiD identifies causal effects using variation across groups and time
Parallel trends assumption is crucial and testable
Multiple periods require unit and time fixed effects
Event studies help visualize treatment dynamics
Extensions handle complex treatment timing
Synthetic controls work for single treated units
Careful consideration of identifying assumptions is essential
DiD is widely used in policy evaluation

Part 10: Appendices and Practice

Mathematical Appendix

A.1 Matrix Algebra Review

Matrix Operations

Matrix Multiplication: If $\mathbf{A}$ is $m \times n$ and $\mathbf{B}$ is $n \times p$, then $\mathbf{AB}$ is $m \times p$ $[\mathbf{AB}]*{ij} = \sum*{k=1}^n a_{ik}b_{kj}$
Transpose Properties:
- \[(\mathbf{A}')' = \mathbf{A}\]
- \[(\mathbf{AB})' = \mathbf{B}'\mathbf{A}'\]
- \[(\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}'\]
Inverse Properties:
- \[(\mathbf{A}^{-1})^{-1} = \mathbf{A}\]
- \[(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}\]
- \[(\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'\]

Useful Matrix Results

Quadratic Forms: For symmetric matrix $\mathbf{A}$: $\frac{\partial}{\partial \mathbf{x}}(\mathbf{x}'\mathbf{Ax}) = 2\mathbf{Ax}$
Matrix Differentiation: $\frac{\partial}{\partial \mathbf{b}}(\mathbf{y} - \mathbf{Xb})'(\mathbf{y} - \mathbf{Xb}) = -2\mathbf{X}'(\mathbf{y} - \mathbf{Xb})$
Projection Matrix: $\mathbf{P} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'$ Properties:
- $\mathbf{P}^2 = \mathbf{P}$ (idempotent)
- $\mathbf{P}' = \mathbf{P}$ (symmetric)
- \[\mathbf{PX} = \mathbf{X}\]

A.2 Probability Theory Review

Key Distributions

Normal Distribution: $X \sim N(\mu, \sigma^2)$ $f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$
Chi-squared Distribution: If $Z_i \sim N(0,1)$, then $\sum_{i=1}^n Z_i^2 \sim \chi^2_n$
t-Distribution: If $Z \sim N(0,1)$ and $V \sim \chi^2_n$, then $\frac{Z}{\sqrt{V/n}} \sim t_n$
F-Distribution: If $V_1 \sim \chi^2_{n_1}$ and $V_2 \sim \chi^2_{n_2}$, then $\frac{V_1/n_1}{V_2/n_2} \sim F_{n_1,n_2}$

Central Limit Theorem

For i.i.d. random variables with $E[X_i] = \mu$ and $\text{Var}(X_i) = \sigma^2$: $\frac{\sqrt{n}(\bar{X} - \mu)}{\sigma} \stackrel{d}{\rightarrow} N(0,1)$

Law of Large Numbers

\[\bar{X} \stackrel{p}{\rightarrow} \mu \text{ as } n \rightarrow \infty\]

A.3 Key Econometric Proofs

Proof: OLS is Unbiased

Starting with: $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}$

$\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$ $= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'(\mathbf{X}\boldsymbol{\beta} + \mathbf{u})$ $= \boldsymbol{\beta} + (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{u}$

Taking expectations: $E[\hat{\boldsymbol{\beta}}\|\mathbf{X}] = \boldsymbol{\beta} + (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'E[\mathbf{u}\|\mathbf{X}] = \boldsymbol{\beta}$

Statistical Tables

Critical Values: Standard Normal Distribution

α (two-tailed)	α/2 (one-tailed)	Critical Value
0.10	0.05	±1.645
0.05	0.025	±1.960
0.01	0.005	±2.576

Critical Values: t-Distribution (Selected)

df	α = 0.10	α = 0.05	α = 0.01
10	1.812	2.228	3.169
20	1.725	2.086	2.845
30	1.697	2.042	2.750
∞	1.645	1.960	2.576

Critical Values: F-Distribution (α = 0.05)

df₁\df₂	10	20	30	∞
1	4.96	4.35	4.17	3.84
2	4.10	3.49	3.32	3.00
3	3.71	3.10	2.92	2.60
5	3.33	2.71	2.53	2.21

Practice Problems and Solutions

Problem Set 1: OLS Mechanics

Problem 1.1 Consider the regression: $Y_i = \alpha + \beta X_i + \gamma Z_i + \varepsilon_i$

Derive the first-order conditions for OLS estimation.

Solution: Minimize: $S(\alpha, \beta, \gamma) = \sum_{i=1}^n (Y_i - \alpha - \beta X_i - \gamma Z_i)^2$

First-order conditions: $\frac{\partial S}{\partial \alpha} = -2\sum_{i=1}^n (Y_i - \alpha - \beta X_i - \gamma Z_i) = 0$ $\frac{\partial S}{\partial \beta} = -2\sum_{i=1}^n X_i(Y_i - \alpha - \beta X_i - \gamma Z_i) = 0$ $\frac{\partial S}{\partial \gamma} = -2\sum_{i=1}^n Z_i(Y_i - \alpha - \beta X_i - \gamma Z_i) = 0$

These yield the normal equations.

Problem 1.2 Prove that OLS residuals sum to zero when the model includes an intercept.

Solution: From the first-order condition: $\sum_{i=1}^n (Y_i - \hat{\alpha} - \hat{\beta}X_i) = \sum_{i=1}^n \hat{u}_i = 0$

Problem Set 2: Hypothesis Testing

Problem 2.1 Given regression results: $\widehat{\log(\text{wage})} = 1.28 + 0.090 \cdot \text{educ} + 0.041 \cdot \text{exper}$ $\text{SE}: \quad\quad\quad (0.11) \quad (0.008) \quad\quad (0.005)$ $n = 500, \quad R^2 = 0.316$

a) Test whether education affects wages at the 5% level. b) Construct a 95% confidence interval for the return to education. c) Test whether education and experience are jointly significant.

Solution: a) $H_0: \beta_{educ} = 0$ vs $H_1: \beta_{educ} \neq 0$

\[t = \frac{0.090 - 0}{0.008} = 11.25\]

Since $\|11.25\| > 1.96$, reject $H_0$ at 5% level.

b) $CI = 0.090 \pm 1.96 \times 0.008 = [0.074, 0.106]$

c) Use F-test: $F = \frac{R^2/k}{(1-R^2)/(n-k-1)} = \frac{0.316/2}{0.684/497} = 114.8$

Since $F > 3.00$ (critical value), reject joint insignificance.

Problem Set 3: Dummy Variables

Problem 3.1 Consider the wage regression: $\text{wage}_i = \beta_0 + \beta_1 \text{female}_i + \beta_2 \text{educ}_i + \beta_3 (\text{female}_i \times \text{educ}_i) + u_i$

Derive the returns to education for males and females.

Solution: For males (female = 0): $\frac{\partial \text{wage}}{\partial \text{educ}} = \beta_2$

For females (female = 1): $\frac{\partial \text{wage}}{\partial \text{educ}} = \beta_2 + \beta_3$

The difference in returns is $\beta_3$.

Problem Set 4: Functional Forms

Problem 4.1 Given: $\log(Y) = 2.5 + 0.8\log(X_1) + 0.3\log(X_2)$

a) Interpret the coefficients b) Test whether the production function exhibits constant returns to scale

Solution: a) Coefficients are elasticities:

1% increase in $X_1$ increases $Y$ by 0.8%
1% increase in $X_2$ increases $Y$ by 0.3%

b) Test $H_0: \beta_1 + \beta_2 = 1$ Sum = 0.8 + 0.3 = 1.1 Need standard error of sum to test formally.

Problem Set 5: Time Series

Problem 5.1 Test whether the AR(1) process $Y_t = 0.95Y_{t-1} + \varepsilon_t$ is stationary.

Solution: For stationarity, need $\|\phi\| < 1$. Since $\|0.95\| < 1$, the process is stationary.

Exam-Style Questions

Question 1 (20 points) Consider the regression model: $Y_i = \beta_0 + \beta_1 X_i + u_i$

a) State the assumptions required for OLS to be BLUE. (5 points) b) Derive the OLS estimator for $\beta_1$. (7 points) c) Show that the OLS estimator is unbiased. (8 points)

Question 2 (25 points) An economist estimates the effect of class size on test scores: $\text{TestScore}_i = 700 - 2.5 \cdot \text{ClassSize}_i + u_i$ $(15.2) \quad (0.8)$

a) Interpret the coefficient on ClassSize. (5 points) b) Test whether class size affects test scores at the 1% level. (10 points) c) What might cause omitted variable bias in this regression? (10 points)

Question 3 (30 points) Consider a DiD analysis of minimum wage effects:

	Pre-Period	Post-Period
Treatment	85	92
Control	80	83

a) Calculate the DiD estimate. (10 points) b) State the identifying assumption. (10 points) c) How would you test this assumption? (10 points)

Practice with Software Output

Interpreting STATA Output

. reg wage educ exper female

Source |       SS           df       MS
-------+----------------------------------
Model  |  2534.567         3     844.856
Resid  |  3456.789       496     6.971
-------+----------------------------------
Total  |  5991.356       499    12.007

------------------------------------------------------
wage  |    Coef.   Std. Err.     t    P>\|t\|   [95% CI]
------+----------------------------------------------
educ  |   1.234     0.123    10.03  0.000   [0.992, 1.476]
exper |   0.456     0.045    10.13  0.000   [0.368, 0.544]
female|  -2.345     0.567    -4.14  0.000   [-3.459, -1.231]
_cons |   5.678     1.234     4.60  0.000   [3.254, 8.102]
------------------------------------------------------

Questions:

Write the estimated regression equation.
Test the significance of each coefficient at 5%.
Calculate the R-squared.
Interpret each coefficient.

Key Formulas Summary

OLS Estimators:
- Simple: $\hat{\beta}_1 = \frac{\sum(X_i - \bar{X})(Y_i - \bar{Y})}{\sum(X_i - \bar{X})^2}$
- Matrix: $\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$
Standard Errors:
- \[\text{SE}(\hat{\beta}*j) = \sqrt{\hat{\sigma}^2[(\mathbf{X}'\mathbf{X})^{-1}]*{jj}}\]
Test Statistics:
- t-test: $t = \frac{\hat{\beta}*j - \beta*{j,0}}{\text{SE}(\hat{\beta}_j)}$
- F-test: $F = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-k-1)}$
Goodness of Fit:
- \[R^2 = 1 - \frac{SSR}{TSS} = \frac{ESS}{TSS}\]
- \[\bar{R}^2 = 1 - \frac{(1-R^2)(n-1)}{n-k-1}\]
DiD:
- \[\text{DiD} = (\bar{Y}*{T,post} - \bar{Y}*{T,pre}) - (\bar{Y}*{C,post} - \bar{Y}*{C,pre})\]

Part 11: Comprehensive Review and Advanced Topics

Advanced Econometric Methods

11.1 Instrumental Variables (IV) Estimation

When we have endogeneity ($\text{Cov}(X,u) \neq 0$), OLS is biased. IV estimation provides a solution.

The IV Model

For the model: $Y_i = \beta_0 + \beta_1 X_i + u_i$

An instrument $Z_i$ must satisfy:

Relevance: $\text{Cov}(Z,X) \neq 0$
Exogeneity: $\text{Cov}(Z,u) = 0$

Two-Stage Least Squares (2SLS)

Stage 1: $X_i = \pi_0 + \pi_1 Z_i + v_i$ Stage 2: $Y_i = \beta_0 + \beta_1 \hat{X}_i + u_i$

The 2SLS estimator: $\hat{\beta}_{2SLS} = \frac{\text{Cov}(Z,Y)}{\text{Cov}(Z,X)}$

Testing Instrument Validity

Relevance: F-statistic from first stage > 10 (rule of thumb)
Overidentification test: With multiple instruments, test $J$-statistic
Wu-Hausman test: Compare OLS and IV estimates

11.2 Panel Data Methods (Advanced)

Fixed Effects Estimation

Model: $Y_{it} = \beta X_{it} + \alpha_i + u_{it}$

Within transformation: $\tilde{Y}*{it} = Y*{it} - \bar{Y}*i$ $\tilde{X}*{it} = X_{it} - \bar{X}_i$

Estimate: $\tilde{Y}*{it} = \beta \tilde{X}*{it} + \tilde{u}_{it}$

Random Effects Estimation

When $\text{Cov}(\alpha_i, X_{it}) = 0$, use GLS: $Y_{it} = \beta X_{it} + \alpha_i + u_{it}$

Hausman Test

Tests $H_0$: Random effects is consistent and efficient $H = (\hat{\beta}*{FE} - \hat{\beta}*{RE})'[\hat{V}*{FE} - \hat{V}*{RE}]^{-1}(\hat{\beta}*{FE} - \hat{\beta}*{RE})$

Under $H_0$: $H \sim \chi^2_k$

11.3 Limited Dependent Variables

Probit Model

\[P(Y_i = 1\|X_i) = \Phi(\beta_0 + \beta_1 X_i)\]

where $\Phi(\cdot)$ is the standard normal CDF.

Marginal effects: $\frac{\partial P(Y_i = 1\|X_i)}{\partial X_i} = \phi(\beta_0 + \beta_1 X_i) \cdot \beta_1$

Logit Model

\[P(Y_i = 1\|X_i) = \frac{e^{\beta_0 + \beta_1 X_i}}{1 + e^{\beta_0 + \beta_1 X_i}}\]

Log-odds interpretation: $\log\left(\frac{P}{1-P}\right) = \beta_0 + \beta_1 X_i$

Tobit Model

For censored data: $Y_i^* = \beta_0 + \beta_1 X_i + u_i$ $Y_i = \max(0, Y_i^*)$

11.4 Advanced Time Series

ARIMA Models

ARIMA(p,d,q): $(1-\phi_1L-...-\phi_pL^p)(1-L)^d Y_t = (1+\theta_1L+...+\theta_qL^q)\varepsilon_t$

Vector Error Correction Model (VECM)

For cointegrated series: $\Delta \mathbf{Y}*t = \mathbf{\Pi Y}*{t-1} + \sum_{i=1}^{p-1} \mathbf{\Gamma}*i \Delta \mathbf{Y}*{t-i} + \mathbf{u}_t$

where $\mathbf{\Pi} = \mathbf{\alpha \beta}'$ (error correction term)

11.5 Quantile Regression

Instead of conditional mean, estimate conditional quantiles: $Q_\tau(Y\|X) = X\beta(\tau)$

Minimize: $\sum_{i:Y_i \geq X_i\beta} \tau\|Y_i - X_i\beta\| + \sum_{i:Y_i < X_i\beta} (1-\tau)\|Y_i - X_i\beta\|$

Applications: Wage distributions, risk analysis

Comprehensive Review Questions

Section 1: Conceptual Understanding

Causal Inference
- What is the fundamental problem of causal inference?
- How does randomization solve the selection problem?
- What are the key assumptions for causal interpretation of regression coefficients?
Model Specification
- What are the consequences of omitting a relevant variable?
- How do you test for functional form misspecification?
- When should you use logs vs. levels?
Statistical Inference
- Explain the difference between consistency and unbiasedness
- What are the consequences of heteroskedasticity?
- How do clustered standard errors differ from robust standard errors?

Section 2: Applied Problems

Problem 1: Returns to Education You want to estimate the causal effect of education on wages.

a) Write down a regression model b) What are potential sources of endogeneity? c) Suggest an instrumental variable and justify its validity d) How would you test whether IV is necessary?

Solution Outline: a) $\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + u_i$

b) Ability bias, measurement error, reverse causality

c) Quarter of birth (Angrist-Krueger), distance to college

Relevance: Affects education through compulsory schooling
Exogeneity: Randomly assigned, shouldn’t affect wages directly

d) Wu-Hausman test comparing OLS and IV estimates

Problem 2: Policy Evaluation A state implements a job training program. You have data before and after for treatment and control states.

a) Set up a DiD model b) State the identifying assumption c) How would you test this assumption? d) What if treatment timing varies?

Solution Outline: a) $Y_{st} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_s + \beta_3 (\text{Post}_t \times \text{Treat}*s) + u*{st}$

b) Parallel trends assumption

c) Event study specification, test pre-trends

d) Use staggered DiD with unit and time fixed effects

Section 3: Data Analysis Project

Project: Housing Prices Analysis

Dataset includes:

House prices
Square footage, bedrooms, bathrooms
Neighborhood characteristics
School quality measures
Crime rates

Tasks:

Explore functional form (linear vs. log)
Test for spatial dependence
Address potential endogeneity of school quality
Implement a hedonic price model
Forecast prices for new listings

Common Econometric Pitfalls

P-hacking: Testing multiple specifications until finding significance
Ignoring clustered errors: Underestimating standard errors
Bad controls: Controlling for outcomes of treatment
Extrapolation: Predictions outside the support of data
Reverse causality: Ignoring feedback effects
Sample selection bias: Non-random sampling
Measurement error: Attenuation bias
Multicollinearity: Inflated standard errors
Overfitting: Too many parameters relative to observations
Ignoring dynamics: Static models for dynamic processes

Software Implementation Guide

STATA Commands Reference

* Basic regression
reg y x1 x2, robust
reg y x1 x2, cluster(group)

* Panel data
xtset id time
xtreg y x1 x2, fe
xtreg y x1 x2, re
hausman fe re

* IV regression
ivregress 2sls y x1 (x2 = z1 z2)
ivreg2 y x1 (x2 = z1 z2), first

* Time series
tsset time
arima y, arima(1,1,1)
dfuller y, trend
vec y1 y2 y3

* Limited dependent variables
probit y x1 x2
logit y x1 x2
tobit y x1 x2, ll(0)

* Difference-in-differences
gen post = (time >= 2000)
gen treat_post = treat * post
reg y treat post treat_post, cluster(state)

R Commands Reference

# Basic regression
lm(y ~ x1 + x2, data=df)
coeftest(model, vcov=vcovHC(model, type="HC1"))

# Panel data
library(plm)
fe_model <- plm(y ~ x1 + x2, data=df, model="within")
re_model <- plm(y ~ x1 + x2, data=df, model="random")
phtest(fe_model, re_model)

# IV regression
library(AER)
ivreg(y ~ x1 + x2 \| x1 + z1 + z2, data=df)

# Time series
library(forecast)
auto.arima(y)
adf.test(y)

# Limited dependent variables
glm(y ~ x1 + x2, family=binomial(link="probit"))
glm(y ~ x1 + x2, family=binomial(link="logit"))

Final Exam Preparation Checklist

Theory

Classical linear model assumptions
Properties of OLS estimators
Gauss-Markov theorem
Hypothesis testing framework
Maximum likelihood estimation
Asymptotic theory

Methods

Applications

Interpret regression output
Choose appropriate functional form
Test model assumptions
Handle common data issues
Write up empirical results
Critique empirical papers

Software Skills

Summary

Econometrics provides tools for:

Estimating causal effects
Testing economic theories
Forecasting future values
Policy evaluation

Key takeaways:

Correlation ≠ causation
Careful identification is crucial
Always check assumptions
Consider economic theory
Be transparent about limitations
Use appropriate methods for your data
Interpret results in context

Remember: “All models are wrong, but some are useful.” - George Box

Econometrics

Table of Contents

Part 1: Introduction and Foundations

Part 2: Causality and Simple Regression

Part 3: OLS Mechanics

Part 4: Statistical Inference

Part 5: Multiple Regression

Part 6: Dummy Variables

Part 7: Functional Forms

Part 8: Time Series Foundations

Part 9: Advanced Topics

Part 10: Appendices

Overview

Objectives

Key Themes

Mathematical Prerequisites

Linear Algebra

Calculus

Probability and Statistics

Notation Guide

General Notation

Subscripts and Superscripts

Matrix Notation

Common Expressions

Statistical Operators

Test Statistics

Currency Notation

Greek Letters Used

Abbreviations

Part 2: Causality and Simple Regression

Chapter 1: Causality and the Notion of Ceteris Paribus

1.1 Introduction to Causal Analysis

The Fundamental Problem of Causal Inference

1.2 The Ceteris Paribus Condition

Definition

Example: Returns to Education

1.3 Experimental vs. Observational Data

Randomized Controlled Experiments

Observational Data Challenges

1.4 The Role of Regression in Causal Analysis

1.5 Conditions for Causal Interpretation

Chapter 2: Models Linear in Parameters

2.1 The Simple Linear Regression Model

Interpretation of Parameters

2.2 Classical Linear Model Assumptions

Assumption 1: Linearity in Parameters

Assumption 2: Random Sampling

Assumption 3: Sample Variation in \(X\)

Assumption 4: Zero Conditional Mean

Assumption 5: Homoskedasticity (for inference)

2.3 The Population Regression Function

Properties of the PRF

2.4 The Sample Regression Function

Residuals

2.5 Examples of Linear Models

Example 1: Wage-Education Relationship

Example 2: Consumption Function

Example 3: Production Function (log-linear)

2.6 Sources of Error Terms

2.7 When Linearity Fails

2.8 Sample Problems

Problem 1

Problem 2

2.9 Key Takeaways

2.10 Practice Questions

Part 3: OLS Mechanics

Chapter 3: OLS Mechanics and Estimation

3.1 Introduction to OLS

The OLS Criterion

3.2 Derivation of OLS Estimators

Simple Regression Case

Alternative Forms of the Slope Estimator

3.3 Properties of OLS Residuals

Property 1: Sum of Residuals Equals Zero

Property 2: Sample Covariance Between Regressors and Residuals is Zero

Property 3: Sample Mean of Residuals is Zero

Property 4: Predicted Values and Residuals are Uncorrelated

Property 5: The Regression Line Passes Through the Point of Averages

3.4 Total Variation Decomposition

Proof of Decomposition