Econometrics

Table of Contents

Part 1: Introduction and Foundations

  1. Overview
  2. Mathematical Prerequisites
  3. Notation Guide

Part 2: Causality and Simple Regression

  1. Causality and the Notion of Ceteris Paribus
  2. Models Linear in Parameters

Part 3: OLS Mechanics

  1. OLS Mechanics and Estimation
    • 6.1 Derivation of OLS Estimators
    • 6.2 Properties of OLS Residuals
    • 6.3 Total Variation Decomposition

Part 4: Statistical Inference

  1. OLS Inference and Random Sampling
    • 7.1 Sampling Distribution of OLS Estimators
    • 7.2 Hypothesis Testing
    • 7.3 Confidence Intervals

Part 5: Multiple Regression

  1. Simple vs Multiple Regression
    • 8.1 Omitted Variable Bias
    • 8.2 Frisch-Waugh-Lovell Theorem
    • 8.3 Control Variables

Part 6: Dummy Variables

  1. Qualitative Information with Dummy Variables
    • 9.1 Binary Regressors
    • 9.2 Multiple Categories
    • 9.3 Interaction Terms

Part 7: Functional Forms

  1. Functional Form and Nonlinear Relationships
    • 10.1 Logarithmic Transformations
    • 10.2 Polynomial Models
    • 10.3 Interpretation of Coefficients

Part 8: Time Series Foundations

  1. Stationarity, Persistence, and Serial Correlation
  2. Regression Analysis with Time Series Data

Part 9: Advanced Topics

  1. Time Series Analysis (Advanced)
  2. Causal Effect with Difference-in-Differences

Part 10: Appendices

  1. Mathematical Appendix
  2. Statistical Tables
  3. Practice Problems and Solutions

Overview

Econometrics is the application of statistical methods to economic data for the purpose of testing hypotheses and forecasting future trends.

Objectives

  1. Understand causal relationships in economic data and the concept of ceteris paribus
  2. Apply regression analysis to real-world economic problems
  3. Interpret regression results correctly and understand their limitations
  4. Test economic hypotheses using appropriate statistical methods
  5. Handle various data types including cross-sectional, time series, and panel data
  6. Recognize and address common econometric problems

Key Themes

  • Causality vs. Correlation: Understanding when regression coefficients have causal interpretations
  • Model Specification: Choosing appropriate functional forms and control variables
  • Statistical Inference: Making valid conclusions from sample data
  • Practical Application: Connecting theory to real-world economic problems

Mathematical Prerequisites

Linear Algebra

Prerequisites:

  1. Matrix Operations
    • Matrix multiplication: If \(\mathbf{A}\) is \(m \times n\) and \(\mathbf{B}\) is \(n \times p\), then \(\mathbf{AB}\) is \(m \times p\)
    • Matrix transpose: \((\mathbf{AB})' = \mathbf{B}'\mathbf{A}'\)
    • Matrix inverse: \(\mathbf{AA}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}\)
  2. Vector Operations
    • Inner product: \(\mathbf{x}'\mathbf{y} = \sum_{i=1}^n x_i y_i\)
    • Outer product: \(\mathbf{xy}'\) produces an \(n \times n\) matrix

Calculus

Essential calculus concepts include:

  1. Differentiation
    • Partial derivatives: \(\frac{\partial f(x,y)}{\partial x}\)
    • Chain rule: \(\frac{d}{dx}f(g(x)) = f'(g(x))g'(x)\)
  2. Optimization
    • First-order conditions: \(\frac{\partial f}{\partial x} = 0\)
    • Second-order conditions for minima/maxima

Probability and Statistics

Core statistical concepts:

  1. Random Variables
    • Expected value: \(E[X] = \int x f(x) dx\) (continuous) or \(\sum x P(X=x)\) (discrete)
    • Variance: \(\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - [E[X]]^2\)
    • Covariance: \(\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])]\)
  2. Distributions
    • Normal distribution: \(X \sim N(\mu, \sigma^2)\)
    • t-distribution with \(n\) degrees of freedom: \(t_n\)
    • Chi-squared distribution: \(\chi^2_n\)
    • F-distribution: \(F_{n_1,n_2}\)
  3. Estimation Theory
    • Unbiasedness: \(E[\hat{\theta}] = \theta\)
    • Consistency: \(\text{plim}_{n \to \infty} \hat{\theta} = \theta\)
    • Efficiency: minimum variance among unbiased estimators

Notation Guide

General Notation

  • \(Y\): Dependent variable (outcome variable)
  • \(X\): Independent variable (explanatory variable, regressor)
  • \(\beta\): Population parameter (true coefficient)
  • \(\hat{\beta}\): Estimated coefficient
  • \(\varepsilon\) or \(u\): Error term (disturbance term)
  • \(\hat{u}\) or \(\hat{\varepsilon}\): Residual (estimated error)
  • \(n\): Sample size
  • \(k\): Number of regressors (including constant)

Subscripts and Superscripts

  • \(i\): Index for individual observations \((i = 1, 2, ..., n)\)
  • \(t\): Index for time periods (in time series)
  • \(j\): Index for variables \((j = 1, 2, ..., k)\)

Matrix Notation

  • \(\mathbf{y}\): \(n \times 1\) vector of dependent variable observations
  • \(\mathbf{X}\): \(n \times k\) matrix of independent variables
  • \(\boldsymbol{\beta}\): \(k \times 1\) vector of parameters
  • \(\mathbf{u}\): \(n \times 1\) vector of errors
  • \(\hat{\mathbf{u}}\): \(n \times 1\) vector of residuals

Common Expressions

  1. Simple Regression Model: \(Y_i = \beta_0 + \beta_1 X_i + u_i\)
  2. Multiple Regression Model: \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + ... + \beta_k X_{ki} + u_i\)
  3. Matrix Form: \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}\)
  4. OLS Estimator: \(\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\)
  5. Variance of OLS Estimator: \(\text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2(\mathbf{X}'\mathbf{X})^{-1}\)

Statistical Operators

  • \(E[\cdot]\): Expected value operator
  • \(\text{Var}(\cdot)\): Variance operator
  • \(\text{Cov}(\cdot, \cdot)\): Covariance operator
  • \(\text{Corr}(\cdot, \cdot)\): Correlation operator
  • \(\text{plim}\): Probability limit
  • \(\stackrel{p}{\rightarrow}\): Converges in probability
  • \(\stackrel{d}{\rightarrow}\): Converges in distribution

Test Statistics

  1. t-statistic: \(t = \frac{\hat{\beta}*j - \beta*{j,0}}{\text{SE}(\hat{\beta}_j)}\)
  2. F-statistic: \(F = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-k-1)}\)
  3. R-squared: \(R^2 = 1 - \frac{SSR}{TSS} = \frac{ESS}{TSS}\)

Currency Notation

When referring to monetary values, we use escaped dollar signs: \(100,\)50, etc., to distinguish from LaTeX delimiters.

Greek Letters Used

  • \(\alpha\) (alpha): Often used for intercept
  • \(\beta\) (beta): Regression coefficients
  • \(\gamma\) (gamma): Alternative coefficients
  • \(\delta\) (delta): Difference or change
  • \(\varepsilon\) (epsilon): Error term
  • \(\theta\) (theta): General parameter
  • \(\lambda\) (lambda): Eigenvalue or Lagrange multiplier
  • \(\mu\) (mu): Population mean
  • \(\sigma\) (sigma): Standard deviation
  • \(\rho\) (rho): Correlation coefficient
  • \(\tau\) (tau): Treatment effect
  • \(\phi\) (phi): Autoregressive parameter
  • \(\chi\) (chi): Chi-squared distribution
  • \(\omega\) (omega): Alternative error term

Abbreviations

  • OLS: Ordinary Least Squares
  • MLE: Maximum Likelihood Estimation
  • IV: Instrumental Variables
  • 2SLS: Two-Stage Least Squares
  • GMM: Generalized Method of Moments
  • BLUE: Best Linear Unbiased Estimator
  • CLT: Central Limit Theorem
  • LLN: Law of Large Numbers
  • i.i.d.: Independent and Identically Distributed
  • MSE: Mean Squared Error
  • SSR: Sum of Squared Residuals
  • ESS: Explained Sum of Squares
  • TSS: Total Sum of Squares
  • DF: Degrees of Freedom
  • SE: Standard Error
  • CI: Confidence Interval
  • DGP: Data Generating Process

Part 2: Causality and Simple Regression

Chapter 1: Causality and the Notion of Ceteris Paribus

1.1 Introduction to Causal Analysis

Econometrics is fundamentally about understanding causal relationships. When we ask questions like “What is the effect of education on wages?” or “How does class size affect student performance?”, we are seeking causal answers, not mere correlations.

The Fundamental Problem of Causal Inference

The challenge in causal inference is that we can never observe the same unit under both treatment and control conditions simultaneously. This is known as the fundamental problem of causal inference.

For individual \(i\):

  • \(Y_i(1)\): Potential outcome if treated
  • \(Y_i(0)\): Potential outcome if not treated
  • Causal effect: \(\tau_i = Y_i(1) - Y_i(0)\)

We only observe one of these potential outcomes, never both.

1.2 The Ceteris Paribus Condition

Ceteris paribus is a Latin phrase meaning “all other things being equal” or “holding other things constant.” This concept is central to causal inference in econometrics.

Definition

The ceteris paribus effect of \(X\) on \(Y\) is the change in \(Y\) resulting from a one-unit change in \(X\), holding all other relevant factors constant.

Mathematically, if \(Y = f(X, Z)\) where \(Z\) represents all other factors: \(\frac{\partial Y}{\partial X} \bigg\|_{Z=\bar{Z}}\)

Example: Returns to Education

Consider the wage equation: \(\text{wage} = f(\text{education}, \text{ability}, \text{experience}, \text{family background}, ...)\)

The ceteris paribus effect of education on wages is: \(\frac{\partial \text{wage}}{\partial \text{education}} \bigg\|_{\text{other factors constant}}\)

1.3 Experimental vs. Observational Data

Randomized Controlled Experiments

In an ideal randomized experiment:

  1. Subjects are randomly assigned to treatment and control groups
  2. Random assignment ensures treatment is independent of potential outcomes
  3. The only systematic difference between groups is the treatment

Under random assignment: \(E[Y_i(1)\|D_i=1] - E[Y_i(0)\|D_i=0] = E[Y_i(1) - Y_i(0)] = \text{Average Treatment Effect}\)

where \(D_i\) is the treatment indicator.

Observational Data Challenges

With observational data:

  1. Treatment is not randomly assigned
  2. Treatment may be correlated with other factors affecting the outcome
  3. Simple comparisons may not yield causal effects

1.4 The Role of Regression in Causal Analysis

Regression analysis can help approximate ceteris paribus effects when:

  1. We can measure and control for confounding factors
  2. The relationship is correctly specified
  3. Key assumptions are satisfied

The population regression function: \(E[Y\|X] = \beta_0 + \beta_1 X\)

Under certain conditions, \(\beta_1\) represents the ceteris paribus effect of \(X\) on \(Y\).

1.5 Conditions for Causal Interpretation

For regression coefficients to have causal interpretations, we need:

  1. Zero Conditional Mean Assumption: \(E[u\|X] = 0\)
    • The error term is uncorrelated with the regressor
    • No omitted variables correlated with \(X\)
  2. No Perfect Multicollinearity
    • Regressors are not perfectly correlated
  3. Variation in the Treatment Variable
    • \[\text{Var}(X) > 0\]
  4. Correct Functional Form
    • The relationship is correctly specified

Chapter 2: Models Linear in Parameters

2.1 The Simple Linear Regression Model

The simple linear regression model is: \(Y_i = \beta_0 + \beta_1 X_i + u_i\)

where:

  • \(Y_i\): Dependent variable for observation \(i\)
  • \(X_i\): Independent variable for observation \(i\)
  • \(\beta_0\): Intercept parameter
  • \(\beta_1\): Slope parameter
  • \(u_i\): Error term for observation \(i\)

Interpretation of Parameters

  1. Intercept (\(\beta_0\)): Expected value of \(Y\) when \(X = 0\) \(E[Y\|X=0] = \beta_0\)
  2. Slope (\(\beta_1\)): Change in expected value of \(Y\) for a one-unit change in \(X\) \(\beta_1 = \frac{\partial E[Y\|X]}{\partial X}\)

2.2 Classical Linear Model Assumptions

The classical linear model requires several assumptions:

Assumption 1: Linearity in Parameters

The model is linear in parameters (but not necessarily in variables): \(Y_i = \beta_0 + \beta_1 X_i + u_i\)

Assumption 2: Random Sampling

We have a random sample of \(n\) observations \({(X_i, Y_i): i = 1, ..., n}\) from the population.

Assumption 3: Sample Variation in \(X\)

The sample variance of \(X\) is positive: \(\sum_{i=1}^n (X_i - \bar{X})^2 > 0\)

Assumption 4: Zero Conditional Mean

\[E[u_i\|X_i] = 0\]

This implies:

  1. \[E[u_i] = 0\]
  2. \[\text{Cov}(X_i, u_i) = 0\]
  3. \[E[Y_i\|X_i] = \beta_0 + \beta_1 X_i\]

Assumption 5: Homoskedasticity (for inference)

\[\text{Var}(u_i\|X_i) = \sigma^2\]

The error variance is constant across all values of \(X\).

2.3 The Population Regression Function

The population regression function (PRF) is: \(E[Y\|X] = \beta_0 + \beta_1 X\)

This represents the expected value of \(Y\) given \(X\) in the population.

Properties of the PRF

  1. It’s the best linear predictor of \(Y\) given \(X\)
  2. It minimizes the expected squared prediction error
  3. The error term \(u\) has zero mean conditional on \(X\)

2.4 The Sample Regression Function

The sample regression function (SRF) is: \(\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i\)

where \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are estimates of the population parameters.

Residuals

The residual for observation \(i\) is: \(\hat{u}_i = Y_i - \hat{Y}_i = Y_i - (\hat{\beta}_0 + \hat{\beta}_1 X_i)\)

2.5 Examples of Linear Models

Example 1: Wage-Education Relationship

\[\text{wage}_i = \beta_0 + \beta_1 \text{educ}_i + u_i\]

Interpretation: \(\beta_1\) is the expected change in wages for an additional year of education.

Example 2: Consumption Function

\[\text{consumption}_i = \beta_0 + \beta_1 \text{income}_i + u_i\]

Interpretation: \(\beta_1\) is the marginal propensity to consume.

Example 3: Production Function (log-linear)

\[\log(Q_i) = \beta_0 + \beta_1 \log(L_i) + u_i\]

Interpretation: \(\beta_1\) is the elasticity of output with respect to labor.

2.6 Sources of Error Terms

The error term \(u_i\) captures:

  1. Omitted variables: Factors affecting \(Y\) but not included in the model
  2. Measurement error: Inaccuracies in measuring \(Y\) or \(X\)
  3. Functional form misspecification: True relationship is not linear
  4. Random variation: Inherent unpredictability in human behavior

2.7 When Linearity Fails

The linearity assumption may be violated when:

  1. The true relationship is nonlinear
  2. There are interaction effects
  3. The effect of \(X\) on \(Y\) depends on the level of \(X\)

Solutions include:

  • Logarithmic transformations
  • Polynomial terms
  • Interaction terms
  • Piecewise linear models

2.8 Sample Problems

Problem 1

Consider the model: \(Y_i = \beta_0 + \beta_1 X_i + u_i\)

Given: \(E[u_i\|X_i] = 0\) and \(\text{Var}(u_i\|X_i) = \sigma^2\)

a) Show that \(E[Y_i\|X_i] = \beta_0 + \beta_1 X_i\) b) Derive \(\text{Var}(Y_i\|X_i)\)

Solution: a) \(E[Y_i\|X_i] = E[\beta_0 + \beta_1 X_i + u_i\|X_i] = \beta_0 + \beta_1 X_i + E[u_i\|X_i] = \beta_0 + \beta_1 X_i\)

b) \(\text{Var}(Y_i\|X_i) = \text{Var}(\beta_0 + \beta_1 X_i + u_i\|X_i) = \text{Var}(u_i\|X_i) = \sigma^2\)

Problem 2

In a wage regression: \(\text{wage}_i = 600 + 80 \times \text{educ}_i + u_i\)

Interpret the coefficients.

Solution:

  • \(\hat{\beta}_0 = 600\): Expected wage for someone with zero years of education
  • \(\hat{\beta}_1 = 80\): Each additional year of education is associated with an $$80 increase in wages

2.9 Key Takeaways

  1. Ceteris paribus is essential for causal interpretation
  2. Linear models are linear in parameters, not necessarily in variables
  3. The zero conditional mean assumption is crucial for unbiased estimation
  4. The error term captures all factors affecting \(Y\) not included in the model
  5. Proper interpretation of coefficients depends on the functional form

2.10 Practice Questions

  1. What is the difference between correlation and causation in the context of regression analysis?
  2. Consider the model \(Y_i = \beta_0 + \beta_1 X_i + u_i\). Under what conditions does \(\beta_1\) have a causal interpretation?
  3. Explain why \(E[u_i\|X_i] = 0\) is a stronger assumption than \(E[u_i] = 0\).
  4. If the true relationship is \(Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + u_i\) but we estimate \(Y_i = \alpha_0 + \alpha_1 X_i + v_i\), will \(\hat{\alpha}_1\) be an unbiased estimator of \(\beta_1\)? Explain.

Part 3: OLS Mechanics

Chapter 3: OLS Mechanics and Estimation

3.1 Introduction to OLS

Ordinary Least Squares (OLS) is the most widely used estimation method in econometrics. It provides a way to estimate the parameters of a linear regression model by minimizing the sum of squared residuals.

The OLS Criterion

For the model \(Y_i = \beta_0 + \beta_1 X_i + u_i\), OLS chooses \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to minimize:

\[S(\beta_0, \beta_1) = \sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i)^2\]

3.2 Derivation of OLS Estimators

Simple Regression Case

To find the OLS estimators, we take the first-order conditions:

\[\frac{\partial S}{\partial \beta_0} = -2\sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i) = 0\] \[\frac{\partial S}{\partial \beta_1} = -2\sum_{i=1}^n (Y_i - \beta_0 - \beta_1 X_i)X_i = 0\]

From the first equation: \(\sum_{i=1}^n Y_i = n\beta_0 + \beta_1\sum_{i=1}^n X_i\)

This gives us: \(\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}\)

From the second equation and substituting \(\hat{\beta}_0\): \(\hat{\beta}_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}\)

Alternative Forms of the Slope Estimator

  1. Covariance form: \(\hat{\beta}_1 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2}\)
  2. Deviation form: \(\hat{\beta}_1 = \frac{\sum_{i=1}^n (Y_i - \bar{Y})X_i}{\sum_{i=1}^n (X_i - \bar{X})X_i}\)
  3. Original form: \(\hat{\beta}_1 = \frac{n\sum_{i=1}^n X_iY_i - \sum_{i=1}^n X_i\sum_{i=1}^n Y_i}{n\sum_{i=1}^n X_i^2 - (\sum_{i=1}^n X_i)^2}\)

3.3 Properties of OLS Residuals

The OLS residuals \(\hat{u}_i = Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i\) have several important properties:

Property 1: Sum of Residuals Equals Zero

\[\sum_{i=1}^n \hat{u}_i = 0\]

Proof: This follows directly from the first-order condition: \(\frac{\partial S}{\partial \beta_0} = -2\sum_{i=1}^n \hat{u}_i = 0\)

Property 2: Sample Covariance Between Regressors and Residuals is Zero

\[\sum_{i=1}^n X_i\hat{u}_i = 0\]

Proof: This follows from the second first-order condition: \(\frac{\partial S}{\partial \beta_1} = -2\sum_{i=1}^n X_i\hat{u}_i = 0\)

Property 3: Sample Mean of Residuals is Zero

\[\bar{\hat{u}} = \frac{1}{n}\sum_{i=1}^n \hat{u}_i = 0\]

Property 4: Predicted Values and Residuals are Uncorrelated

\[\sum_{i=1}^n \hat{Y}_i\hat{u}_i = 0\]

Proof: \(\sum_{i=1}^n \hat{Y}_i\hat{u}*i = \sum*{i=1}^n (\hat{\beta}_0 + \hat{\beta}_1 X_i)\hat{u}_i = \hat{\beta}*0\sum*{i=1}^n \hat{u}_i + \hat{\beta}*1\sum*{i=1}^n X_i\hat{u}_i = 0\)

Property 5: The Regression Line Passes Through the Point of Averages

\[\bar{Y} = \hat{\beta}_0 + \hat{\beta}_1\bar{X}\]

3.4 Total Variation Decomposition

One of the most important results in regression analysis is the decomposition of total variation:

\[\sum_{i=1}^n (Y_i - \bar{Y})^2 = \sum_{i=1}^n (\hat{Y}*i - \bar{Y})^2 + \sum*{i=1}^n \hat{u}_i^2\]

Or in terms of sum of squares: \(\text{TSS} = \text{ESS} + \text{SSR}\)

where:

  • TSS (Total Sum of Squares): \(\sum_{i=1}^n (Y_i - \bar{Y})^2\)
  • ESS (Explained Sum of Squares): \(\sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2\)
  • SSR (Sum of Squared Residuals): \(\sum_{i=1}^n \hat{u}_i^2\)

Proof of Decomposition

Start with: \(Y_i - \bar{Y} = (\hat{Y}_i - \bar{Y}) + \hat{u}_i\)

Square both sides and sum: \(\sum_{i=1}^n (Y_i - \bar{Y})^2 = \sum_{i=1}^n [(\hat{Y}_i - \bar{Y}) + \hat{u}_i]^2\)

Expanding: \(= \sum_{i=1}^n (\hat{Y}*i - \bar{Y})^2 + 2\sum*{i=1}^n (\hat{Y}_i - \bar{Y})\hat{u}*i + \sum*{i=1}^n \hat{u}_i^2\)

The middle term equals zero: \(\sum_{i=1}^n (\hat{Y}_i - \bar{Y})\hat{u}*i = \sum*{i=1}^n \hat{Y}_i\hat{u}*i - \bar{Y}\sum*{i=1}^n \hat{u}_i = 0 - 0 = 0\)

3.5 Goodness of Fit: R-squared

The coefficient of determination (R-squared) measures the proportion of variation in \(Y\) explained by \(X\):

\[R^2 = \frac{\text{ESS}}{\text{TSS}} = 1 - \frac{\text{SSR}}{\text{TSS}}\]

Properties of \(R^2\):

  1. \[0 \leq R^2 \leq 1\]
  2. \(R^2 = 1\) implies perfect fit (all residuals are zero)
  3. \(R^2 = 0\) implies no linear relationship
  4. In simple regression, \(R^2 = r_{XY}^2\) (squared correlation coefficient)

3.6 Matrix Notation for OLS

For the multiple regression case with \(k\) regressors: \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}\)

The OLS estimator in matrix form: \(\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\)

Derivation

Minimize: \(S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\)

First-order condition: \(\frac{\partial S}{\partial \boldsymbol{\beta}} = -2\mathbf{X}'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) = 0\)

Solving: \(\mathbf{X}'\mathbf{y} = \mathbf{X}'\mathbf{X}\boldsymbol{\beta}\) \(\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\)

3.7 Special Cases and Applications

Case 1: Regression Through the Origin

If we force \(\beta_0 = 0\): \(Y_i = \beta_1 X_i + u_i\)

The OLS estimator becomes: \(\hat{\beta}*1 = \frac{\sum*{i=1}^n X_iY_i}{\sum_{i=1}^n X_i^2}\)

Case 2: Regression on a Constant

If \(X_i = 1\) for all \(i\): \(Y_i = \beta_0 + u_i\)

The OLS estimator: \(\hat{\beta}_0 = \bar{Y}\)

Case 3: Regression with a Dummy Variable

Consider: \(Y_i = \alpha + \beta D_i + u_i\) where \(D_i \in {0,1}\)

The OLS estimates:

  • \(\hat{\alpha}\): mean of \(Y\) for \(D_i = 0\)
  • \(\hat{\alpha} + \hat{\beta}\): mean of \(Y\) for \(D_i = 1\)
  • \(\hat{\beta}\): difference in means between groups

3.8 Numerical Example

Given data:

X: 1, 2, 3, 4, 5
Y: 2, 4, 5, 4, 5

Calculate OLS estimates:

  1. \(\bar{X} = 3\), \(\bar{Y} = 4\)
  2. \[\sum_{i=1}^5 (X_i - \bar{X})(Y_i - \bar{Y}) = (-2)(-2) + (-1)(0) + (0)(1) + (1)(0) + (2)(1) = 6\]
  3. \[\sum_{i=1}^5 (X_i - \bar{X})^2 = 4 + 1 + 0 + 1 + 4 = 10\]
  4. \[\hat{\beta}_1 = \frac{6}{10} = 0.6\]
  5. \[\hat{\beta}_0 = 4 - 0.6(3) = 2.2\]

Regression equation: \(\hat{Y} = 2.2 + 0.6X\)

3.9 Frisch-Waugh-Lovell Theorem

The FWL theorem provides insight into multiple regression by showing how to obtain coefficients through a series of simple regressions.

For the model: \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i\)

To find \(\hat{\beta}_1\):

  1. Regress \(Y\) on \(X_2\) and obtain residuals \(\tilde{Y}\)
  2. Regress \(X_1\) on \(X_2\) and obtain residuals \(\tilde{X}_1\)
  3. Regress \(\tilde{Y}\) on \(\tilde{X}_1\)

The coefficient from step 3 equals \(\hat{\beta}_1\) from the multiple regression.

3.10 Mean-Centered Regression

Consider the mean-centered model: \((Y_i - \bar{Y}) = \beta_1(X_i - \bar{X}) + v_i\)

Properties:

  1. The intercept is zero
  2. The slope coefficient is identical to the original model
  3. \(R^2\) is the same as the original model
  4. Useful for focusing on the relationship between deviations

3.11 Partitioned Regression

For the partitioned model: \(\mathbf{y} = \mathbf{X}_1\boldsymbol{\beta}_1 + \mathbf{X}_2\boldsymbol{\beta}_2 + \mathbf{u}\)

The OLS estimator for \(\boldsymbol{\beta}_1\): \(\hat{\boldsymbol{\beta}}_1 = [\mathbf{X}_1'(\mathbf{I} - \mathbf{P}_2)\mathbf{X}_1]^{-1}\mathbf{X}_1'(\mathbf{I} - \mathbf{P}_2)\mathbf{y}\)

where \(\mathbf{P}_2 = \mathbf{X}_2(\mathbf{X}_2'\mathbf{X}_2)^{-1}\mathbf{X}_2'\) is the projection matrix for \(\mathbf{X}_2\).

3.12 Practice Problems

Problem 1

Show that the OLS residuals are orthogonal to the fitted values.

Solution: We need to show \(\sum_{i=1}^n \hat{Y}_i\hat{u}_i = 0\)

\(\sum_{i=1}^n \hat{Y}_i\hat{u}*i = \sum*{i=1}^n (\hat{\beta}_0 + \hat{\beta}_1 X_i)\hat{u}_i\) \(= \hat{\beta}*0\sum*{i=1}^n \hat{u}_i + \hat{\beta}*1\sum*{i=1}^n X_i\hat{u}_i\) \(= \hat{\beta}_0 \cdot 0 + \hat{\beta}_1 \cdot 0 = 0\)

Problem 2

Prove that \(R^2\) equals the squared correlation between \(Y\) and \(\hat{Y}\).

Solution: The correlation between \(Y\) and \(\hat{Y}\) is: \(r_{Y,\hat{Y}} = \frac{\text{Cov}(Y,\hat{Y})}{\sqrt{\text{Var}(Y)\text{Var}(\hat{Y})}}\)

After some algebra (using the fact that \(\hat{Y} = \hat{\beta}_0 + \hat{\beta}*1 X\)): \(r*{Y,\hat{Y}}^2 = \frac{\text{ESS}}{\text{TSS}} = R^2\)

3.13 Key Takeaways

  1. OLS minimizes the sum of squared residuals
  2. OLS residuals have specific algebraic properties that always hold
  3. Total variation can be decomposed into explained and unexplained parts
  4. R-squared measures goodness of fit but not causal validity
  5. The FWL theorem shows the relationship between simple and multiple regression
  6. Matrix notation provides a compact way to express OLS results

Part 4: Statistical Inference

Chapter 4: OLS Inference and Random Sampling

4.1 Statistical Properties of OLS Estimators

Under the classical assumptions, OLS estimators have several desirable properties.

Finite Sample Properties

  1. Unbiasedness: \(E[\hat{\beta}_j] = \beta_j\)
  2. Efficiency: OLS has the smallest variance among linear unbiased estimators (Gauss-Markov theorem)
  3. Consistency: \(\text{plim}_{n \to \infty} \hat{\beta}_j = \beta_j\)

4.2 The Gauss-Markov Theorem

Theorem: Under assumptions MLR.1-MLR.5, the OLS estimator \(\hat{\boldsymbol{\beta}}\) is the Best Linear Unbiased Estimator (BLUE).

Assumptions for Gauss-Markov:

  1. Linear in parameters: \(Y_i = \beta_0 + \beta_1 X_{1i} + ... + \beta_k X_{ki} + u_i\)
  2. Random sampling
  3. No perfect collinearity
  4. Zero conditional mean: \(E[u_i\|\mathbf{X}] = 0\)
  5. Homoskedasticity: \(\text{Var}(u_i\|\mathbf{X}) = \sigma^2\)

4.3 Sampling Distribution of OLS Estimators

For Simple Regression

The OLS slope estimator can be written as: \(\hat{\beta}*1 = \beta_1 + \frac{\sum*{i=1}^n (X_i - \bar{X})u_i}{\sum_{i=1}^n (X_i - \bar{X})^2}\)

This shows that \(\hat{\beta}_1\) is a linear combination of the error terms.

Variance of OLS Estimators

For the simple regression slope: \(\text{Var}(\hat{\beta}*1) = \frac{\sigma^2}{\sum*{i=1}^n (X_i - \bar{X})^2} = \frac{\sigma^2}{n \cdot \text{Var}(X)}\)

For the intercept: \(\text{Var}(\hat{\beta}*0) = \sigma^2 \left[\frac{1}{n} + \frac{\bar{X}^2}{\sum*{i=1}^n (X_i - \bar{X})^2}\right]\)

4.4 Standard Errors

Since \(\sigma^2\) is unknown, we estimate it using: \(\hat{\sigma}^2 = \frac{1}{n-k-1}\sum_{i=1}^n \hat{u}_i^2 = \frac{\text{SSR}}{n-k-1}\)

The standard error of \(\hat{\beta}_j\) is: \(\text{SE}(\hat{\beta}_j) = \sqrt{\hat{\text{Var}}(\hat{\beta}_j)}\)

For simple regression: \(\text{SE}(\hat{\beta}*1) = \frac{\hat{\sigma}}{\sqrt{\sum*{i=1}^n (X_i - \bar{X})^2}}\)

4.5 Hypothesis Testing

The t-statistic

To test \(H_0: \beta_j = \beta_{j,0}\) vs \(H_1: \beta_j \neq \beta_{j,0}\):

\[t = \frac{\hat{\beta}*j - \beta*{j,0}}{\text{SE}(\hat{\beta}_j)}\]

Under \(H_0\) and the classical assumptions, \(t \sim t_{n-k-1}\).

Common Hypothesis Tests

  1. Testing Significance: \(H_0: \beta_j = 0\) \(t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}\)
  2. One-sided Tests:
    • \(H_0: \beta_j \leq 0\) vs \(H_1: \beta_j > 0\)
    • Reject if \(t > t_{\alpha,n-k-1}\)
  3. Two-sided Tests:
    • \(H_0: \beta_j = 0\) vs \(H_1: \beta_j \neq 0\)
    • Reject if \(\|t\| > t_{\alpha/2,n-k-1}\)

4.6 Confidence Intervals

A \((1-\alpha)100%\) confidence interval for \(\beta_j\): \(\hat{\beta}*j \pm t*{\alpha/2,n-k-1} \cdot \text{SE}(\hat{\beta}_j)\)

Properties:

  1. Contains the true parameter with probability \((1-\alpha)\)
  2. Wider intervals indicate more uncertainty
  3. Interval width depends on sample size and error variance

4.7 The F-Test for Joint Hypotheses

To test multiple restrictions simultaneously: \(H_0: \beta_1 = 0, \beta_2 = 0, ..., \beta_q = 0\)

The F-statistic: \(F = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-k-1)}\)

where:

  • \(SSR_r\): Sum of squared residuals from restricted model
  • \(SSR_{ur}\): Sum of squared residuals from unrestricted model
  • \(q\): Number of restrictions

Under \(H_0\): \(F \sim F_{q,n-k-1}\)

4.8 Alternative Forms of the F-Test

Using R-squared

\[F = \frac{(R^2_{ur} - R^2_r)/q}{(1-R^2_{ur})/(n-k-1)}\]

For Overall Significance

Testing \(H_0: \beta_1 = \beta_2 = ... = \beta_k = 0\): \(F = \frac{R^2/k}{(1-R^2)/(n-k-1)}\)

4.9 Asymptotic Properties

As \(n \to \infty\):

  1. Consistency: \(\hat{\beta}_j \stackrel{p}{\rightarrow} \beta_j\)
  2. Asymptotic Normality: \(\sqrt{n}(\hat{\beta}*j - \beta_j) \stackrel{d}{\rightarrow} N(0, \sigma^2*{\beta_j})\)
  3. Asymptotic Efficiency: OLS achieves the Cramér-Rao lower bound

4.10 The Classical Normal Linear Model

Adding the normality assumption: \(u_i \sim N(0, \sigma^2)\)

Under normality:

  1. OLS estimators are normally distributed in finite samples
  2. \(t\)-statistics exactly follow \(t\)-distributions
  3. \(F\)-statistics exactly follow \(F\)-distributions
  4. OLS is the maximum likelihood estimator

4.11 Violations of Classical Assumptions

Heteroskedasticity

If \(\text{Var}(u_i\|X_i) = \sigma_i^2\):

  • OLS remains unbiased and consistent
  • Standard errors are incorrect
  • Use heteroskedasticity-robust standard errors

Non-normality

  • OLS remains unbiased
  • For large samples, inference remains valid (CLT)
  • For small samples, exact distributions may not hold

4.12 Practical Example

Consider the wage equation: \(\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + u_i\)

Estimated results: \(\widehat{\log(\text{wage})}_i = 0.284 + 0.092 \text{educ}_i + 0.0041 \text{exper}_i\) \(\text{SE}: \quad\quad\quad (0.104) \quad (0.007) \quad\quad\quad (0.0017)\) \(n = 526, \quad R^2 = 0.316\)

Testing if education affects wages: \(t = \frac{0.092}{0.007} = 13.14\)

Since \(\|t\| > 1.96\), reject \(H_0: \beta_1 = 0\) at 5% level.

4.13 Sample Problems

Problem 1

Given OLS results: \(\hat{Y} = 10 + 2X\) \(\text{SE}(\hat{\beta}_0) = 1.5, \quad \text{SE}(\hat{\beta}_1) = 0.5\) \(n = 30\)

a) Test \(H_0: \beta_1 = 0\) at 5% level b) Construct a 95% CI for \(\beta_1\)

Solution: a) \(t = \frac{2-0}{0.5} = 4\). With \(df = 28\), \(t_{0.025,28} \approx 2.048\). Since \(4 > 2.048\), reject \(H_0\).

b) \(CI: 2 \pm 2.048 \times 0.5 = [0.976, 3.024]\)

Problem 2

Test the joint hypothesis \(H_0: \beta_1 = \beta_2 = 0\) given:

  • Unrestricted model: \(R^2 = 0.40\), \(k = 3\), \(n = 50\)
  • Restricted model: \(R^2 = 0.30\)

Solution: \(F = \frac{(0.40 - 0.30)/2}{(1-0.40)/(50-3-1)} = \frac{0.05}{0.013} = 3.85\)

With \(F_{2,46}\) critical value at 5% ≈ 3.20, reject \(H_0\).

4.14 Key Statistical Tables

Critical Values for t-distribution (two-sided)

df 10% 5% 1%
10 1.812 2.228 3.169
20 1.725 2.086 2.845
30 1.697 2.042 2.750
1.645 1.960 2.576

Critical Values for F-distribution (5% level)

df₁\df₂ 10 20 30
1 4.96 4.35 4.17 3.84
2 4.10 3.49 3.32 3.00
3 3.71 3.10 2.92 2.60

4.15 Key Takeaways

  1. OLS estimators are unbiased under the zero conditional mean assumption
  2. Standard errors measure the precision of estimates
  3. t-tests are used for individual coefficients
  4. F-tests are used for joint hypotheses
  5. Confidence intervals provide a range of plausible values
  6. Large sample properties rely on the Central Limit Theorem
  7. Violations of classical assumptions affect inference procedures

Part 5: Multiple Regression

Chapter 5: Simple vs Multiple Regression

5.1 Motivation for Multiple Regression

Multiple regression allows us to:

  1. Control for confounding factors
  2. Include multiple explanatory variables
  3. Reduce omitted variable bias
  4. Improve prediction accuracy

The multiple regression model: \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + ... + \beta_k X_{ki} + u_i\)

5.2 Interpretation of Coefficients

In multiple regression, \(\beta_j\) represents the partial effect of \(X_j\) on \(Y\), holding all other variables constant: \(\beta_j = \frac{\partial E[Y\|X_1,...,X_k]}{\partial X_j}\)

This is the ceteris paribus effect we seek for causal inference.

5.3 Omitted Variable Bias

The Omitted Variable Bias Formula

If the true model is: \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i\)

But we estimate: \(Y_i = \alpha_0 + \alpha_1 X_{1i} + v_i\)

Then: \(\text{plim}(\hat{\alpha}*1) = \beta_1 + \beta_2 \cdot \delta*{21}\)

where \(\delta_{21}\) is the coefficient from regressing \(X_2\) on \(X_1\).

Conditions for Omitted Variable Bias

Bias occurs when:

  1. The omitted variable affects \(Y\) (\(\beta_2 \neq 0\))
  2. The omitted variable is correlated with included variables (\(\delta_{21} \neq 0\))

Direction of bias: \(\text{Bias} = \beta_2 \cdot \delta_{21}\)

5.4 Example: Wage Equation

Consider: \(\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{abil}_i + u_i\)

If we omit ability:

  • \(\beta_2 > 0\) (ability increases wages)
  • \(\delta_{21} > 0\) (ability and education are positively correlated)
  • Therefore: \(\hat{\beta}_1\) is upward biased

5.5 The Frisch-Waugh-Lovell Theorem

The FWL theorem shows how multiple regression coefficients can be obtained through a series of simple regressions.

For the model: \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i\)

To obtain \(\hat{\beta}_1\):

  1. Regress \(Y\) on \(X_2\) and get residuals \(\tilde{Y}\)
  2. Regress \(X_1\) on \(X_2\) and get residuals \(\tilde{X}_1\)
  3. Regress \(\tilde{Y}\) on \(\tilde{X}_1\)

The coefficient from step 3 equals \(\hat{\beta}_1\) from the multiple regression.

Proof Outline

Let \(\mathbf{M}_2 = \mathbf{I} - \mathbf{X}_2(\mathbf{X}_2'\mathbf{X}_2)^{-1}\mathbf{X}_2'\)

Then: \(\hat{\beta}_1 = (\mathbf{X}_1'\mathbf{M}_2\mathbf{X}_1)^{-1}\mathbf{X}_1'\mathbf{M}_2\mathbf{y}\)

Since \(\mathbf{M}_2\mathbf{X}_1 = \tilde{\mathbf{X}}_1\) and \(\mathbf{M}_2\mathbf{y} = \tilde{\mathbf{y}}\): \(\hat{\beta}_1 = (\tilde{\mathbf{X}}_1'\tilde{\mathbf{X}}_1)^{-1}\tilde{\mathbf{X}}_1'\tilde{\mathbf{y}}\)

5.6 Relationship Between Simple and Multiple Regression

For the model: \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i\)

Let:

  • \(\hat{\beta}_1^{simple}\): coefficient from regressing \(Y\) on \(X_1\) only
  • \(\hat{\beta}_1^{multiple}\): coefficient from multiple regression
  • \(\hat{\delta}_{21}\): coefficient from regressing \(X_2\) on \(X_1\)

Then: \(\hat{\beta}_1^{simple} = \hat{\beta}_1^{multiple} + \hat{\beta}*2 \cdot \hat{\delta}*{21}\)

5.7 Control Variables

A control variable is included to:

  1. Reduce omitted variable bias
  2. Account for confounding factors
  3. Isolate the effect of interest

Properties of control variables:

  • May not have causal interpretation
  • Help achieve conditional independence
  • Reduce error variance

Example: Returns to Education

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + \beta_3 \text{IQ}_i + u_i\]

Here, experience and IQ are control variables to isolate the effect of education.

5.8 Perfect Multicollinearity

Perfect multicollinearity occurs when one regressor is an exact linear function of others.

Examples:

  1. Including all dummy variables for a categorical variable plus intercept
  2. Including a variable and its perfect linear transformation
  3. The “dummy variable trap”

Consequences:

  • \((\mathbf{X}'\mathbf{X})\) is not invertible
  • OLS estimates cannot be computed
  • One variable must be dropped

5.9 Imperfect Multicollinearity

When regressors are highly (but not perfectly) correlated:

Consequences:

  1. Large standard errors
  2. Sensitive coefficient estimates
  3. Difficulty isolating individual effects
  4. OLS remains unbiased

Detection:

  • Variance Inflation Factor (VIF): \(\text{VIF}_j = \frac{1}{1-R_j^2}\)
  • Correlation matrix
  • Condition number

5.10 Specification Issues

Including Irrelevant Variables

Effects of including irrelevant variables (\(\beta_j = 0\)):

  • OLS remains unbiased
  • Variance increases
  • Efficiency loss

Excluding Relevant Variables

Effects of excluding relevant variables (\(\beta_j \neq 0\)):

  • OLS is biased (unless uncorrelated)
  • Inconsistent estimates
  • Invalid inference

5.11 Practical Example

Consider test scores and class size:

Simple regression: \(\widehat{\text{TestScore}} = 698.9 - 2.28 \cdot \text{STR}\) \(\text{SE}: \quad\quad\quad\quad (10.4) \quad (0.52)\)

Multiple regression: \(\widehat{\text{TestScore}} = 686.0 - 1.10 \cdot \text{STR} - 0.65 \cdot \text{PctEL}\) \(\text{SE}: \quad\quad\quad\quad (8.7) \quad\quad (0.43) \quad\quad\quad (0.04)\)

The coefficient on STR changes substantially when controlling for percent English learners.

5.12 Sample Problems

Problem 1

Given:

  • Simple regression: \(\hat{Y} = 10 + 3X_1\)
  • Multiple regression: \(\hat{Y} = 8 + 2X_1 + 4X_2\)
  • Regression of \(X_2\) on \(X_1\): \(\hat{X}_2 = 1 + 0.25X_1\)

Verify the relationship between simple and multiple regression coefficients.

Solution: \(\hat{\beta}_1^{simple} = \hat{\beta}_1^{multiple} + \hat{\beta}*2 \cdot \hat{\delta}*{21}\) \(3 = 2 + 4 \cdot 0.25 = 2 + 1 = 3 \quad \checkmark\)

Problem 2

Consider the model: \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i\)

Given:

  • \[\text{Corr}(X_1, X_2) = 0\]
  • \[\hat{\beta}_1^{simple} = 5\]

What is \(\hat{\beta}_1^{multiple}\)?

Solution: Since \(\text{Corr}(X_1, X_2) = 0\), we have \(\hat{\delta}_{21} = 0\). Therefore: \(\hat{\beta}_1^{multiple} = \hat{\beta}_1^{simple} = 5\)

5.13 Adjusted R-squared

The adjusted R-squared penalizes for additional regressors: \(\bar{R}^2 = 1 - \frac{SSR/(n-k-1)}{TSS/(n-1)} = 1 - \frac{n-1}{n-k-1}(1-R^2)\)

Properties:

  1. \(\bar{R}^2 < R^2\) (unless \(R^2 = 1\))
  2. Can be negative
  3. Doesn’t always increase with additional regressors
  4. Better for model comparison

5.14 Model Selection Criteria

Akaike Information Criterion (AIC)

\[\text{AIC} = n \log\left(\frac{SSR}{n}\right) + 2(k+1)\]

Bayesian Information Criterion (BIC)

\[\text{BIC} = n \log\left(\frac{SSR}{n}\right) + (k+1)\log(n)\]

Lower values indicate better model fit.

5.15 Key Takeaways

  1. Multiple regression isolates partial effects
  2. Omitted variable bias has specific direction and magnitude
  3. The FWL theorem explains the mechanics of multiple regression
  4. Control variables reduce bias but may not have causal interpretation
  5. Perfect multicollinearity prevents estimation
  6. Model selection involves tradeoffs between bias and variance
  7. Adjusted R-squared accounts for model complexity

Part 6: Dummy Variables

Chapter 6: Qualitative Information with Dummy Variables

6.1 Introduction to Dummy Variables

Dummy variables (also called binary or indicator variables) allow us to include qualitative information in regression models.

Definition: A dummy variable takes only two values:

  • 1 if a condition is met
  • 0 otherwise

Examples:

  • \(\text{Female}_i = 1\) if individual \(i\) is female, 0 if male
  • \(\text{College}_i = 1\) if individual \(i\) has a college degree, 0 otherwise
  • \(\text{Treatment}_i = 1\) if unit \(i\) received treatment, 0 if control

6.2 Single Dummy Variable

Consider the model: \(Y_i = \beta_0 + \beta_1 D_i + u_i\)

where \(D_i \in {0,1}\).

Interpretation

  • When \(D_i = 0\): \(E[Y_i\|D_i=0] = \beta_0\)
  • When \(D_i = 1\): \(E[Y_i\|D_i=1] = \beta_0 + \beta_1\)
  • Therefore: \(\beta_1 = E[Y_i\|D_i=1] - E[Y_i\|D_i=0]\)

\(\beta_1\) represents the difference in means between the two groups.

Example: Gender Wage Gap

\[\text{wage}_i = \beta_0 + \beta_1 \text{female}_i + u_i\]

If \(\hat{\beta}_1 = -2.50\), female workers earn $$2.50 less per hour on average.

6.3 Dummy Variables with Multiple Categories

For a categorical variable with \(m\) categories, use \(m-1\) dummy variables.

Example: Education Levels

Categories: High School, Bachelor’s, Master’s, PhD

Define:

  • \(\text{Bachelor}_i = 1\) if highest degree is Bachelor’s
  • \(\text{Master}_i = 1\) if highest degree is Master’s
  • \(\text{PhD}_i = 1\) if highest degree is PhD
  • High School is the reference category (omitted)

Model: \(\text{wage}_i = \beta_0 + \beta_1 \text{Bachelor}_i + \beta_2 \text{Master}_i + \beta_3 \text{PhD}_i + u_i\)

Interpretations:

  • \(\beta_0\): average wage for high school graduates
  • \(\beta_1\): wage premium for Bachelor’s vs. high school
  • \(\beta_2\): wage premium for Master’s vs. high school
  • \(\beta_3\): wage premium for PhD vs. high school

6.4 The Dummy Variable Trap

Including all \(m\) dummies plus an intercept creates perfect multicollinearity: \(\sum_{j=1}^m D_{ji} = 1 \text{ for all } i\)

Solutions:

  1. Omit one category (reference category)
  2. Omit the intercept
  3. Use deviation from mean coding

6.5 Interactions with Dummy Variables

Dummy variables can interact with continuous variables to allow different slopes.

Model with Interaction

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_i + \beta_3 (X_i \times D_i) + u_i\]

Interpretation:

  • When \(D_i = 0\): \(Y_i = \beta_0 + \beta_1 X_i + u_i\)
  • When \(D_i = 1\): \(Y_i = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) X_i + u_i\)

Effects:

  • \(\beta_2\): difference in intercepts
  • \(\beta_3\): difference in slopes

Example: Returns to Education by Gender

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{female}_i + \beta_3 (\text{educ}_i \times \text{female}_i) + u_i\]

Results: \(\widehat{\log(\text{wage})} = 0.389 + 0.082 \cdot \text{educ} - 0.227 \cdot \text{female} + 0.0056 \cdot (\text{educ} \times \text{female})\)

For males: \(\widehat{\log(\text{wage})} = 0.389 + 0.082 \cdot \text{educ}\) For females: \(\widehat{\log(\text{wage})} = 0.162 + 0.0876 \cdot \text{educ}\)

6.6 Testing for Group Differences

Testing for Differences in Intercepts

\(H_0: \beta_2 = 0\) (no difference in intercepts) Use standard t-test on dummy coefficient.

Testing for Differences in Slopes

\(H_0: \beta_3 = 0\) (no difference in slopes) Use standard t-test on interaction coefficient.

Testing for Any Difference

\(H_0: \beta_2 = \beta_3 = 0\) (identical regressions) Use F-test for joint significance.

6.7 Chow Test for Structural Break

The Chow test tests whether regression coefficients differ across groups.

For model: \(Y_i = \beta_0 + \beta_1 X_i + u_i\)

Test whether coefficients differ between groups A and B:

  1. Run pooled regression: get \(SSR_p\)
  2. Run separate regressions for each group: get \(SSR_A\) and \(SSR_B\)
  3. Compute: \(F = \frac{(SSR_p - SSR_A - SSR_B)/k}{(SSR_A + SSR_B)/(n-2k)}\)

Under \(H_0\): \(F \sim F_{k,n-2k}\)

6.8 Linear Probability Model

When the dependent variable is binary: \(Y_i = \beta_0 + \beta_1 X_{1i} + ... + \beta_k X_{ki} + u_i\)

where \(Y_i \in {0,1}\).

Properties:

  1. \[E[Y_i\|X_i] = P(Y_i=1\|X_i)\]
  2. Predicted values are probabilities
  3. Coefficients represent changes in probability

Problems:

  1. Predictions can be < 0 or > 1
  2. Heteroskedasticity: \(\text{Var}(u_i\|X_i) = P[Y_i=1\|X_i](1-P(Y_i=1\|X_i))\)
  3. Non-normal errors

Solutions:

  • Use robust standard errors
  • Consider logit or probit models

6.9 Difference-in-Differences with Dummies

Basic DiD setup: \(Y_{it} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_i + \beta_3 (\text{Post}_t \times \text{Treat}*i) + u*{it}\)

where:

  • \(\text{Post}_t = 1\) for post-treatment period
  • \(\text{Treat}_i = 1\) for treatment group
  • \(\beta_3\) is the DiD estimate

6.10 Practical Applications

Application 1: Wage Discrimination

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{female}_i + \beta_2 \text{educ}_i + \beta_3 \text{exper}_i + u_i\]

Testing for discrimination after controlling for human capital.

Application 2: Program Evaluation

\[\text{outcome}_i = \beta_0 + \beta_1 \text{treatment}_i + \beta_2 \mathbf{X}_i + u_i\]

Estimating treatment effects with controls.

Application 3: Seasonal Effects

\[\text{sales}_t = \beta_0 + \beta_1 \text{Q2}_t + \beta_2 \text{Q3}_t + \beta_3 \text{Q4}_t + u_t\]

Capturing seasonal patterns (Q1 is reference).

6.11 Sample Problems

Problem 1

Consider: \(\text{wage}_i = \beta_0 + \beta_1 \text{female}_i + u_i\)

Given data:

  • Males: \(n_m = 100\), \(\bar{\text{wage}}_m = 20\)
  • Females: \(n_f = 80\), \(\bar{\text{wage}}_f = 18\)

Find OLS estimates.

Solution: \(\hat{\beta}_0 = \bar{\text{wage}}_m = 20\) \(\hat{\beta}_1 = \bar{\text{wage}}_f - \bar{\text{wage}}_m = 18 - 20 = -2\)

Problem 2

Test for different returns to education by gender: \(\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{female}_i + \beta_3 (\text{educ}_i \times \text{female}_i) + u_i\)

Given: \(\hat{\beta}_3 = 0.015\), \(\text{SE}(\hat{\beta}_3) = 0.008\)

Test \(H_0: \beta_3 = 0\) at 5% level.

Solution: \(t = \frac{0.015}{0.008} = 1.875\)

Since \(\|1.875\| < 1.96\), fail to reject \(H_0\). No significant difference in returns to education.

6.12 Advanced Topics

Regression Discontinuity with Dummies

\[Y_i = \beta_0 + \beta_1 D_i + \beta_2 (X_i - c) + \beta_3 D_i(X_i - c) + u_i\]

where \(D_i = 1\) if \(X_i \geq c\).

Triple Interactions

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 D_{1i} + \beta_3 D_{2i} + \beta_4 (X_i \times D_{1i}) + \beta_5 (X_i \times D_{2i}) + \beta_6 (D_{1i} \times D_{2i}) + \beta_7 (X_i \times D_{1i} \times D_{2i}) + u_i\]

Complex but allows for very flexible specifications.

6.13 STATA Implementation

* Single dummy
reg wage female

* Multiple categories
reg wage i.education

* Interaction
reg wage c.educ##i.female

* Chow test
reg wage educ if female==0
estimates store male
reg wage educ if female==1
estimates store female
reg wage educ
estimates store pooled
lrtest pooled male female

6.14 Key Takeaways

  1. Dummy variables incorporate qualitative information
  2. Always omit one category to avoid the dummy trap
  3. Interactions allow different slopes across groups
  4. Linear probability models have limitations
  5. Chow test formally tests for structural differences
  6. Difference-in-differences uses dummy interactions
  7. Careful interpretation is crucial with multiple dummies

Part 7: Functional Forms

Chapter 7: Functional Form, Nonlinear Relationships, and Interpretation of Coefficients

7.1 Introduction to Functional Forms

Linear regression is “linear in parameters” but can accommodate nonlinear relationships through:

  1. Variable transformations
  2. Polynomial terms
  3. Interaction effects
  4. Piecewise functions

7.2 Logarithmic Transformations

Log-Level Model

\[\log(Y_i) = \beta_0 + \beta_1 X_i + u_i\]

Interpretation: A one-unit change in \(X\) is associated with a \(100 \cdot \beta_1\) percent change in \(Y\).

Exact percentage change: \(%\Delta Y = 100 \cdot [e^{\beta_1} - 1]\)

For small \(\beta_1\) (roughly \(\|\beta_1\| < 0.10\)): \(%\Delta Y \approx 100 \cdot \beta_1\)

Level-Log Model

\[Y_i = \beta_0 + \beta_1 \log(X_i) + u_i\]

Interpretation: A 1% increase in \(X\) is associated with a \(\beta_1/100\) unit change in \(Y\).

Log-Log Model (Constant Elasticity)

\[\log(Y_i) = \beta_0 + \beta_1 \log(X_i) + u_i\]

Interpretation: A 1% increase in \(X\) is associated with a \(\beta_1\)% increase in \(Y\). \(\beta_1\) is the elasticity of \(Y\) with respect to \(X\).

7.3 Why Use Logarithms?

  1. Percentage interpretation: Natural for many economic variables
  2. Reduce skewness: Log transformation often normalizes right-skewed data
  3. Reduce heteroskedasticity: Variance often proportional to level
  4. Elasticities: Direct interpretation in log-log models
  5. Diminishing returns: Captures concave relationships

7.4 Examples of Log Models

Example 1: Wage Equation (Log-Level)

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + u_i\]

If \(\hat{\beta}_1 = 0.08\): One more year of education increases wages by approximately 8%.

Example 2: Production Function (Log-Log)

\[\log(Q_i) = \beta_0 + \beta_1 \log(L_i) + \beta_2 \log(K_i) + u_i\]

If \(\hat{\beta}_1 = 0.7\): A 1% increase in labor increases output by 0.7%.

Example 3: Demand Function (Level-Log)

\[Q_i = \beta_0 + \beta_1 \log(P_i) + \beta_2 \log(I_i) + u_i\]

If \(\hat{\beta}_1 = -50\): A 1% increase in price reduces quantity demanded by 0.5 units.

7.5 Polynomial Models

Quadratic Model

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + u_i\]

Marginal effect: \(\frac{\partial Y}{\partial X} = \beta_1 + 2\beta_2 X\)

Properties:

  • Allows for increasing or decreasing marginal effects
  • Has a turning point at \(X^* = -\frac{\beta_1}{2\beta_2}\)
  • U-shaped if \(\beta_2 > 0\), inverse U-shaped if \(\beta_2 < 0\)

Example: Age-Earnings Profile

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{age}_i + \beta_2 \text{age}_i^2 + u_i\]

Estimated: \(\widehat{\log(\text{wage})} = 0.5 + 0.08 \cdot \text{age} - 0.0009 \cdot \text{age}^2\)

Peak earnings age: \(\text{age}^* = -\frac{0.08}{2(-0.0009)} = 44.4\) years

7.6 Higher-Order Polynomials

\[Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \beta_3 X_i^3 + ... + \beta_p X_i^p + u_i\]

Considerations:

  1. Allows very flexible functional forms
  2. Risk of overfitting
  3. Difficult to interpret beyond quadratic
  4. Can create unrealistic predictions outside sample range

7.7 Interaction Terms

Continuous-Continuous Interaction

\[Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + u_i\]

Marginal effects:

  • \[\frac{\partial Y}{\partial X_1} = \beta_1 + \beta_3 X_2\]
  • \[\frac{\partial Y}{\partial X_2} = \beta_2 + \beta_3 X_1\]

The effect of one variable depends on the level of the other.

Example: Education and Experience

\[\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + \beta_3 (\text{educ}_i \times \text{exper}_i) + u_i\]

If \(\hat{\beta}_3 > 0\): Returns to education increase with experience.

7.8 Interpreting Coefficients in Nonlinear Models

General Principle

For any model \(Y = f(X, \beta)\): \(\text{Marginal Effect} = \frac{\partial Y}{\partial X_j} = \frac{\partial f(X, \beta)}{\partial X_j}\)

Common Cases

  1. Linear: \(Y = \beta_0 + \beta_1 X\)
    • ME = \(\beta_1\) (constant)
  2. Log-Linear: \(\log(Y) = \beta_0 + \beta_1 X\)
    • ME = \(\beta_1 \times Y\)
    • Semi-elasticity = \(\beta_1\)
  3. Linear-Log: \(Y = \beta_0 + \beta_1 \log(X)\)
    • ME = \(\beta_1/X\)
    • Decreasing marginal effect
  4. Log-Log: \(\log(Y) = \beta_0 + \beta_1 \log(X)\)
    • ME = \(\beta_1 \times Y/X\)
    • Elasticity = \(\beta_1\)

7.9 Average Partial Effects (APE)

For nonlinear models, marginal effects vary with \(X\). The APE summarizes the average effect:

\[\text{APE}*j = \frac{1}{n}\sum*{i=1}^n \frac{\partial \hat{Y}*i}{\partial X*{ji}}\]

Example for quadratic model: \(\text{APE} = \hat{\beta}_1 + 2\hat{\beta}_2\bar{X}\)

7.10 Testing Functional Form

RESET Test (Regression Specification Error Test)

  1. Estimate original model, obtain \(\hat{Y}\)
  2. Add powers of \(\hat{Y}\) to original model
  3. Test joint significance of added terms

\(H_0\): Model correctly specified

Testing Linearity vs. Log

To test whether \(Y\) or \(\log(Y)\) is appropriate:

  1. Standardize both to have mean 0, variance 1
  2. Run both regressions
  3. Compare R-squared values
  4. Consider economic theory and interpretation

7.11 Practical Examples

Example 1: Housing Prices

\[\log(\text{price}_i) = \beta_0 + \beta_1 \log(\text{sqft}_i) + \beta_2 \text{bedrooms}_i + \beta_3 \text{age}_i + \beta_4 \text{age}_i^2 + u_i\]

Results:

  • \(\hat{\beta}_1 = 0.85\): 1% increase in square footage increases price by 0.85%
  • \(\hat{\beta}_2 = 0.05\): Each additional bedroom increases price by 5%
  • Age has nonlinear effect with depreciation slowing over time

Example 2: Returns to Scale

\[\log(Q) = \beta_0 + \beta_1 \log(L) + \beta_2 \log(K) + u\]

Testing returns to scale:

  • Constant returns: \(H_0: \beta_1 + \beta_2 = 1\)
  • Increasing returns: \(H_0: \beta_1 + \beta_2 > 1\)
  • Decreasing returns: \(H_0: \beta_1 + \beta_2 < 1\)

7.12 Sample Problems

Problem 1

Given: \(\log(\text{wage}_i) = 0.417 + 0.297 \times \text{female}_i + 0.080 \times \text{educ}_i + 0.029 \times \text{exper}_i\)

Interpret the coefficient on female.

Solution: Since female is a dummy and wage is in logs: \(%\Delta\text{wage} = 100[e^{0.297} - 1] = 34.6%\)

Women earn approximately 34.6% more than men, controlling for education and experience.

Problem 2

Consider: \(\text{price} = \beta_0 + \beta_1 \text{sqft} + \beta_2 \text{sqft}^2 + u\)

Given estimates: \(\hat{\beta}_1 = 200\), \(\hat{\beta}_2 = -0.05\)

a) Find the marginal effect of square footage on price b) At what square footage is price maximized?

Solution: a) \(\frac{\partial \text{price}}{\partial \text{sqft}} = 200 - 0.1 \times \text{sqft}\)

b) Set marginal effect to zero: \(200 - 0.1 \times \text{sqft}^* = 0\) \(\text{sqft}^* = 2000\) square feet

7.13 Common Pitfalls

  1. Interpreting log coefficients as percentages: Valid only for small coefficients
  2. Extrapolation: Polynomial models can behave poorly outside sample range
  3. Multicollinearity: Powers and interactions are correlated with base terms
  4. Over-parameterization: Too many polynomial terms
  5. Ignoring economic theory: Functional form should make economic sense

7.14 Choosing Functional Form

Guidelines:

  1. Start with economic theory
  2. Examine data patterns (scatter plots)
  3. Consider variable properties (always positive? percentages?)
  4. Test alternative specifications
  5. Check residual plots
  6. Evaluate out-of-sample predictions

7.15 Key Takeaways

  1. Logarithmic transformations provide percentage interpretations
  2. Different log specifications have different interpretations
  3. Polynomial terms capture nonlinear relationships
  4. Interaction terms allow effects to vary
  5. Marginal effects depend on functional form
  6. Average partial effects summarize varying marginal effects
  7. Functional form choice affects interpretation and inference
  8. Economic theory should guide specification choices

Part 8: Time Series Foundations

Chapter 8: Stationarity, Persistence, and Serial Correlation

8.1 Time Series Data Characteristics

Time series data has unique features:

  1. Temporal ordering matters
  2. Observations are typically dependent
  3. Trends and seasonality are common
  4. Dynamic relationships exist

Notation: \(Y_t\) denotes the value of variable \(Y\) at time \(t\).

8.2 Stationarity

A time series \({Y_t}\) is strictly stationary if the joint distribution of \((Y_t, Y_{t+1}, ..., Y_{t+k})\) is the same as \((Y_{t+h}, Y_{t+h+1}, ..., Y_{t+h+k})\) for all \(t\), \(k\), and \(h\).

A time series is weakly stationary (or covariance stationary) if:

  1. \(E[Y_t] = \mu\) (constant mean)
  2. \(\text{Var}(Y_t) = \sigma^2\) (constant variance)
  3. \(\text{Cov}(Y_t, Y_{t+h}) = \gamma_h\) (covariance depends only on lag \(h\))

8.3 Examples of Stationary and Non-stationary Processes

Stationary Process: White Noise

\(Y_t = \varepsilon_t\) where \(\varepsilon_t \sim \text{i.i.d.}(0, \sigma^2)\)

Properties:

  • \[E[Y_t] = 0\]
  • \[\text{Var}(Y_t) = \sigma^2\]
  • \(\text{Cov}(Y_t, Y_{t+h}) = 0\) for \(h \neq 0\)

Non-stationary Process: Random Walk

\(Y_t = Y_{t-1} + \varepsilon_t\) where \(Y_0 = 0\) and \(\varepsilon_t \sim \text{i.i.d.}(0, \sigma^2)\)

Properties:

  • \[E[Y_t] = 0\]
  • \(\text{Var}(Y_t) = t\sigma^2\) (variance increases with time)
  • Non-stationary due to time-varying variance

8.4 Autocorrelation

The autocorrelation function (ACF) at lag \(h\): \(\rho_h = \frac{\text{Cov}(Y_t, Y_{t-h})}{\text{Var}(Y_t)} = \frac{\gamma_h}{\gamma_0}\)

Sample autocorrelation: \(\hat{\rho}*h = \frac{\sum*{t=h+1}^T (Y_t - \bar{Y})(Y_{t-h} - \bar{Y})}{\sum_{t=1}^T (Y_t - \bar{Y})^2}\)

8.5 Autoregressive Processes

AR(1) Process

\[Y_t = \phi Y_{t-1} + \varepsilon_t\]

Stationarity condition: \(\|\phi\| < 1\)

Properties when stationary:

  • \[E[Y_t] = 0\]
  • \[\text{Var}(Y_t) = \frac{\sigma^2}{1-\phi^2}\]
  • \[\rho_h = \phi^h\]

AR(p) Process

\[Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p} + \varepsilon_t\]

Stationarity condition: Roots of characteristic equation lie outside unit circle.

8.6 Serial Correlation in Regression Models

Consider the model: \(Y_t = \beta_0 + \beta_1 X_t + u_t\)

Serial correlation occurs when: \(\text{Cov}(u_t, u_{t-h}) \neq 0 \text{ for some } h \neq 0\)

Common form: AR(1) errors \(u_t = \rho u_{t-1} + \varepsilon_t\)

8.7 Consequences of Serial Correlation

With positive serial correlation (\(\rho > 0\)):

  1. OLS remains unbiased and consistent
  2. OLS standard errors are biased (usually downward)
  3. t-statistics are inflated
  4. Confidence intervals are too narrow
  5. OLS is inefficient

8.8 Testing for Serial Correlation

Durbin-Watson Test

Test statistic: \(DW = \frac{\sum_{t=2}^T (\hat{u}*t - \hat{u}*{t-1})^2}{\sum_{t=1}^T \hat{u}_t^2}\)

Approximate relationship: \(DW \approx 2(1 - \hat{\rho})\)

Decision rules:

  • \(DW \approx 2\): No serial correlation
  • \(DW < 2\): Positive serial correlation
  • \(DW > 2\): Negative serial correlation

Breusch-Godfrey Test

  1. Estimate original model, obtain residuals \(\hat{u}_t\)
  2. Regress \(\hat{u}*t\) on original regressors and \(\hat{u}*{t-1}, ..., \hat{u}_{t-p}\)
  3. Test joint significance of lagged residuals

\(H_0\): No serial correlation up to lag \(p\)

8.9 Correcting for Serial Correlation

Method 1: HAC Standard Errors

Heteroskedasticity and Autocorrelation Consistent (HAC) standard errors:

  • Newey-West estimator
  • Corrects standard errors without changing coefficient estimates

Method 2: GLS/Feasible GLS

If errors follow AR(1):

  1. Estimate \(\rho\) from OLS residuals
  2. Transform variables: \(Y_t^* = Y_t - \hat{\rho}Y_{t-1}\)
  3. Run OLS on transformed variables

Method 3: Include Lagged Variables

Add dynamics to the model: \(Y_t = \beta_0 + \beta_1 X_t + \gamma Y_{t-1} + u_t\)

8.10 Unit Roots and Non-stationarity

Unit Root Process

\[Y_t = Y_{t-1} + \varepsilon_t\]

Properties:

  • Shocks have permanent effects
  • Variance grows without bound
  • Standard inference procedures invalid

Dickey-Fuller Test

Test \(H_0: \phi = 1\) in: \(\Delta Y_t = \alpha + \beta t + (\phi - 1)Y_{t-1} + \varepsilon_t\)

Variants:

  • No constant, no trend
  • Constant, no trend
  • Constant and trend

Augmented Dickey-Fuller (ADF) Test

Include lagged differences: \(\Delta Y_t = \alpha + \beta t + \gamma Y_{t-1} + \sum_{j=1}^p \delta_j \Delta Y_{t-j} + \varepsilon_t\)

Test \(H_0: \gamma = 0\)

8.11 Spurious Regression

When two non-stationary series are regressed: \(Y_t = \beta_0 + \beta_1 X_t + u_t\)

Problems:

  1. High \(R^2\) even if series are independent
  2. Significant t-statistics
  3. Invalid inference

Solution: Check for cointegration or difference the series.

8.12 Practical Example

Consider monthly stock returns: \(R_t = \alpha + \beta R_{m,t} + u_t\)

Testing for serial correlation:

  1. Estimate model, obtain residuals
  2. Compute DW statistic: \(DW = 1.85\)
  3. Since \(DW \approx 2\), little evidence of serial correlation
  4. Confirm with Breusch-Godfrey test

8.13 Sample Problems

Problem 1

Given AR(1) process: \(Y_t = 0.7Y_{t-1} + \varepsilon_t\), where \(\varepsilon_t \sim \text{i.i.d.}(0,4)\)

a) Is the process stationary? b) Find the unconditional variance c) Find the first-order autocorrelation

Solution: a) Since \(\|0.7\| < 1\), the process is stationary b) \(\text{Var}(Y_t) = \frac{4}{1-0.7^2} = \frac{4}{0.51} = 7.84\) c) \(\rho_1 = 0.7\)

Problem 2

Testing for serial correlation yields \(DW = 0.8\). What does this suggest?

Solution: Since \(DW \approx 2(1-\hat{\rho})\): \(0.8 \approx 2(1-\hat{\rho})\) \(\hat{\rho} \approx 0.6\)

Strong positive serial correlation is present.

Chapter 9: Regression Analysis with Time Series Data

9.1 Time Series Regression Model

General form: \(Y_t = \beta_0 + \beta_1 X_{1t} + ... + \beta_k X_{kt} + u_t\)

Key assumptions modified for time series:

  1. Linear in parameters
  2. No perfect collinearity
  3. Zero conditional mean: \(E[u_t\|X_t, X_{t-1}, ...] = 0\)
  4. Homoskedasticity: \(\text{Var}(u_t\|X_t, X_{t-1}, ...) = \sigma^2\)
  5. No serial correlation: \(\text{Cov}(u_t, u_s\|X) = 0\) for \(t \neq s\)
  6. Normality (for finite sample inference)

9.2 Static and Dynamic Models

Static Model

\[Y_t = \beta_0 + \beta_1 X_t + u_t\]

Contemporaneous relationship only.

Distributed Lag Model

\[Y_t = \alpha + \beta_0 X_t + \beta_1 X_{t-1} + ... + \beta_q X_{t-q} + u_t\]

Effects distributed over time.

Autoregressive Distributed Lag (ARDL) Model

\[Y_t = \alpha + \gamma_1 Y_{t-1} + ... + \gamma_p Y_{t-p} + \beta_0 X_t + ... + \beta_q X_{t-q} + u_t\]

Includes both lagged dependent and independent variables.

Deterministic Trend

\[Y_t = \beta_0 + \beta_1 t + u_t\]

Linear trend in time.

Stochastic Trend

\(Y_t = Y_{t-1} + \beta_0 + \varepsilon_t\) \(Y_t = Y_0 + \beta_0 t + \sum_{s=1}^t \varepsilon_s\)

Trend with random walk component.

Detrending Methods

  1. Include time trend as regressor
  2. First-difference the data
  3. Use deviation from trend

9.4 Seasonality

Seasonal patterns can be modeled using:

  1. Seasonal dummy variables
  2. Trigonometric functions
  3. Seasonal differencing

Example with quarterly dummies: \(Y_t = \beta_0 + \beta_1 Q2_t + \beta_2 Q3_t + \beta_3 Q4_t + \beta_4 X_t + u_t\)

9.5 Forecasting

One-step-ahead Forecast

For model \(Y_t = \beta_0 + \beta_1 X_t + u_t\): \(\hat{Y}_{T+1\|T} = \hat{\beta}_0 + \hat{\beta}*1 X*{T+1}\)

Forecast error: \(e_{T+1} = Y_{T+1} - \hat{Y}_{T+1\|T}\)

Forecast Evaluation

Mean Squared Forecast Error (MSFE): \(MSFE = \frac{1}{P}\sum_{t=T+1}^{T+P} (Y_t - \hat{Y}_{t\|t-1})^2\)

9.6 Key Takeaways

  1. Stationarity is crucial for valid inference
  2. Serial correlation invalidates standard errors
  3. Unit root processes require special treatment
  4. Spurious regression is a serious concern
  5. Dynamic models capture time dependencies
  6. Trends and seasonality must be addressed
  7. Various tests detect time series problems
  8. Forecasting requires careful model specification

Part 9: Advanced Topics

Chapter 13: Time Series Analysis (Advanced)

13.1 Cointegration

Two non-stationary series \(Y_t\) and \(X_t\) are cointegrated if:

  1. Both are integrated of order 1: \(Y_t \sim I(1)\), \(X_t \sim I(1)\)
  2. There exists \(\beta\) such that \(u_t = Y_t - \beta X_t \sim I(0)\)

Economic interpretation: Long-run equilibrium relationship exists.

Engle-Granger Two-Step Procedure

  1. Test each series for unit root (both should be \(I(1)\))
  2. Estimate cointegrating regression: \(Y_t = \alpha + \beta X_t + u_t\)
  3. Test residuals for unit root (should be \(I(0)\))
  4. If cointegrated, estimate Error Correction Model (ECM)

Error Correction Model (ECM)

\[\Delta Y_t = \alpha + \gamma(Y_{t-1} - \beta X_{t-1}) + \delta \Delta X_t + \varepsilon_t\]

where:

  • \(\gamma\): Speed of adjustment to equilibrium
  • \((Y_{t-1} - \beta X_{t-1})\): Error correction term

13.2 Vector Autoregression (VAR)

For two variables: \(\begin{bmatrix} Y_t \ X_t \end{bmatrix} = \begin{bmatrix} c_1 \ c_2 \end{bmatrix} + \begin{bmatrix} \phi_{11} & \phi_{12} \ \phi_{21} & \phi_{22} \end{bmatrix} \begin{bmatrix} Y_{t-1} \ X_{t-1} \end{bmatrix} + \begin{bmatrix} u_{1t} \ u_{2t} \end{bmatrix}\)

Properties:

  1. All variables treated symmetrically
  2. Captures dynamic interrelationships
  3. Used for forecasting and impulse response analysis

13.3 Panel Data Basics

Panel data combines cross-sectional and time series: \(Y_{it} = \beta_0 + \beta_1 X_{it} + u_{it}\)

where \(i\) indexes units and \(t\) indexes time.

Types of Panel Data Models

  1. Pooled OLS: Ignores panel structure \(Y_{it} = \beta_0 + \beta_1 X_{it} + u_{it}\)
  2. Fixed Effects: Unit-specific intercepts \(Y_{it} = \alpha_i + \beta_1 X_{it} + u_{it}\)
  3. Random Effects: Random intercepts \(Y_{it} = \beta_0 + \beta_1 X_{it} + (\alpha_i + u_{it})\)

13.4 ARCH and GARCH Models

Autoregressive Conditional Heteroskedasticity (ARCH) models time-varying volatility:

ARCH(1) Model

\(Y_t = \mu + \varepsilon_t\) \(\varepsilon_t = \sigma_t z_t\) \(\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2\)

where \(z_t \sim \text{i.i.d.}(0,1)\)

GARCH(1,1) Model

\[\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2\]

Applications: Financial volatility modeling, risk management

Chapter 14: Causal Effect with Difference-in-Differences

14.1 The DiD Framework

Difference-in-Differences estimates causal effects by comparing changes over time between treatment and control groups.

Basic setup:

  • Two periods: Pre-treatment (\(t=0\)) and post-treatment (\(t=1\))
  • Two groups: Treatment (\(D_i=1\)) and control (\(D_i=0\))

14.2 The DiD Estimator

Simple 2×2 DiD

\[\text{DiD} = [E(Y_{i1}\|D_i=1) - E(Y_{i0}\|D_i=1)] - [E(Y_{i1}\|D_i=0) - E(Y_{i0}\|D_i=0)]\]

Or: \(\text{DiD} = (\bar{Y}*{1,1} - \bar{Y}*{1,0}) - (\bar{Y}*{0,1} - \bar{Y}*{0,0})\)

Regression Specification

\[Y_{it} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_i + \beta_3 (\text{Post}_t \times \text{Treat}*i) + u*{it}\]

where:

  • \(\beta_3\): DiD estimate (treatment effect)
  • \(\beta_1\): Time trend
  • \(\beta_2\): Pre-existing differences

14.3 Key Assumptions

  1. Parallel Trends: In absence of treatment, treatment and control groups would have followed parallel trends \(E[Y_{i1}(0) - Y_{i0}(0)\|D_i=1] = E[Y_{i1}(0) - Y_{i0}(0)\|D_i=0]\)
  2. No Anticipation: Treatment group doesn’t change behavior before treatment
  3. SUTVA: Stable Unit Treatment Value Assumption (no spillovers)

Pre-treatment trends test: \(Y_{it} = \alpha_i + \gamma_t + \sum_{k \neq -1} \delta_k (D_i \times I(t=k)) + \varepsilon_{it}\)

Test \(H_0: \delta_k = 0\) for all \(k < 0\)

14.5 DiD with Multiple Time Periods

\[Y_{it} = \alpha_i + \gamma_t + \beta D_{it} + u_{it}\]

where:

  • \(\alpha_i\): Unit fixed effects
  • \(\gamma_t\): Time fixed effects
  • \(D_{it}\): Treatment indicator

14.6 Practical Example: Minimum Wage Study

Card and Krueger (1994) study:

  • Treatment: New Jersey (raised minimum wage)
  • Control: Pennsylvania (no change)
  • Outcome: Employment in fast-food restaurants

Model: \(\text{Emp}_{it} = \beta_0 + \beta_1 \text{After}_t + \beta_2 \text{NJ}_i + \beta_3 (\text{After}_t \times \text{NJ}*i) + u*{it}\)

Results: \(\hat{\beta}_3 > 0\) (employment increased in NJ relative to PA)

14.7 Extensions and Variations

Triple Differences

Add another dimension of comparison: \(Y_{igt} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_i + \beta_3 \text{Group}_g + \beta_4 (\text{Post}_t \times \text{Treat}_i) + \beta_5 (\text{Post}_t \times \text{Group}_g) + \beta_6 (\text{Treat}_i \times \text{Group}_g) + \beta_7 (\text{Post}_t \times \text{Treat}_i \times \text{Group}*g) + u*{igt}\)

Staggered Treatment

Different units treated at different times: \(Y_{it} = \alpha_i + \gamma_t + \sum_k \beta_k \cdot I(\text{time since treatment} = k) + u_{it}\)

14.8 Common Pitfalls

  1. Violation of Parallel Trends: Check pre-trends carefully
  2. Treatment Effect Heterogeneity: Effects may vary across units/time
  3. Bad Controls: Don’t control for outcomes of treatment
  4. Spillover Effects: Treatment may affect control units

14.9 Sample Problems

Problem 1

Given DiD data:

  • Pre-treatment: \(\bar{Y}*{T,0} = 50\), \(\bar{Y}*{C,0} = 40\)
  • Post-treatment: \(\bar{Y}*{T,1} = 70\), \(\bar{Y}*{C,1} = 55\)

Calculate the DiD estimate.

Solution: \(\text{DiD} = (70 - 50) - (55 - 40) = 20 - 15 = 5\)

The treatment effect is 5 units.

Problem 2

Test parallel trends given: \(Y_{it} = 2 + 0.5t + 3D_i - 0.8(D_i \times I(t=-2)) + 0.2(D_i \times I(t=-1)) + 4(D_i \times I(t=1)) + u_{it}\)

Solution: Pre-treatment coefficients: -0.8 (t=-2) and 0.2 (t=-1) Since these are not both zero, parallel trends may be violated.

14.10 Advanced DiD Topics

Synthetic Control Method

When only one treated unit:

  1. Create synthetic control as weighted average of control units
  2. Weights chosen to match pre-treatment characteristics
  3. Compare treated unit to synthetic control

Regression Discontinuity with DiD

Combine RD and DiD for additional identification: \(Y_{it} = \beta_0 + \beta_1 f(X_i) + \beta_2 D_i + \beta_3 \text{Post}_t + \beta_4 (D_i \times \text{Post}*t) + u*{it}\)

14.11 STATA Implementation

* Basic DiD
gen post = (year >= 2000)
gen treat_post = treat * post
reg outcome post treat treat_post, cluster(state)

* With fixed effects
xtset state year
xtreg outcome treat_post, fe cluster(state)

* Event study
forvalues k = -3/3 {
    gen treat_`k' = treat * (year == 2000 + `k')
}
reg outcome treat_*, cluster(state)

14.12 Key Takeaways

  1. DiD identifies causal effects using variation across groups and time
  2. Parallel trends assumption is crucial and testable
  3. Multiple periods require unit and time fixed effects
  4. Event studies help visualize treatment dynamics
  5. Extensions handle complex treatment timing
  6. Synthetic controls work for single treated units
  7. Careful consideration of identifying assumptions is essential
  8. DiD is widely used in policy evaluation

Part 10: Appendices and Practice

Mathematical Appendix

A.1 Matrix Algebra Review

Matrix Operations

  1. Matrix Multiplication: If \(\mathbf{A}\) is \(m \times n\) and \(\mathbf{B}\) is \(n \times p\), then \(\mathbf{AB}\) is \(m \times p\) \([\mathbf{AB}]*{ij} = \sum*{k=1}^n a_{ik}b_{kj}\)
  2. Transpose Properties:
    • \[(\mathbf{A}')' = \mathbf{A}\]
    • \[(\mathbf{AB})' = \mathbf{B}'\mathbf{A}'\]
    • \[(\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}'\]
  3. Inverse Properties:
    • \[(\mathbf{A}^{-1})^{-1} = \mathbf{A}\]
    • \[(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}\]
    • \[(\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'\]

Useful Matrix Results

  1. Quadratic Forms: For symmetric matrix \(\mathbf{A}\): \(\frac{\partial}{\partial \mathbf{x}}(\mathbf{x}'\mathbf{Ax}) = 2\mathbf{Ax}\)
  2. Matrix Differentiation: \(\frac{\partial}{\partial \mathbf{b}}(\mathbf{y} - \mathbf{Xb})'(\mathbf{y} - \mathbf{Xb}) = -2\mathbf{X}'(\mathbf{y} - \mathbf{Xb})\)
  3. Projection Matrix: \(\mathbf{P} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\) Properties:
    • \(\mathbf{P}^2 = \mathbf{P}\) (idempotent)
    • \(\mathbf{P}' = \mathbf{P}\) (symmetric)
    • \[\mathbf{PX} = \mathbf{X}\]

A.2 Probability Theory Review

Key Distributions

  1. Normal Distribution: \(X \sim N(\mu, \sigma^2)\) \(f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\)
  2. Chi-squared Distribution: If \(Z_i \sim N(0,1)\), then \(\sum_{i=1}^n Z_i^2 \sim \chi^2_n\)
  3. t-Distribution: If \(Z \sim N(0,1)\) and \(V \sim \chi^2_n\), then \(\frac{Z}{\sqrt{V/n}} \sim t_n\)
  4. F-Distribution: If \(V_1 \sim \chi^2_{n_1}\) and \(V_2 \sim \chi^2_{n_2}\), then \(\frac{V_1/n_1}{V_2/n_2} \sim F_{n_1,n_2}\)

Central Limit Theorem

For i.i.d. random variables with \(E[X_i] = \mu\) and \(\text{Var}(X_i) = \sigma^2\): \(\frac{\sqrt{n}(\bar{X} - \mu)}{\sigma} \stackrel{d}{\rightarrow} N(0,1)\)

Law of Large Numbers

\[\bar{X} \stackrel{p}{\rightarrow} \mu \text{ as } n \rightarrow \infty\]

A.3 Key Econometric Proofs

Proof: OLS is Unbiased

Starting with: \(\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}\)

\(\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\) \(= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'(\mathbf{X}\boldsymbol{\beta} + \mathbf{u})\) \(= \boldsymbol{\beta} + (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{u}\)

Taking expectations: \(E[\hat{\boldsymbol{\beta}}\|\mathbf{X}] = \boldsymbol{\beta} + (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'E[\mathbf{u}\|\mathbf{X}] = \boldsymbol{\beta}\)

Statistical Tables

Critical Values: Standard Normal Distribution

α (two-tailed) α/2 (one-tailed) Critical Value
0.10 0.05 ±1.645
0.05 0.025 ±1.960
0.01 0.005 ±2.576

Critical Values: t-Distribution (Selected)

df α = 0.10 α = 0.05 α = 0.01
10 1.812 2.228 3.169
20 1.725 2.086 2.845
30 1.697 2.042 2.750
1.645 1.960 2.576

Critical Values: F-Distribution (α = 0.05)

df₁\df₂ 10 20 30
1 4.96 4.35 4.17 3.84
2 4.10 3.49 3.32 3.00
3 3.71 3.10 2.92 2.60
5 3.33 2.71 2.53 2.21

Practice Problems and Solutions

Problem Set 1: OLS Mechanics

Problem 1.1 Consider the regression: \(Y_i = \alpha + \beta X_i + \gamma Z_i + \varepsilon_i\)

Derive the first-order conditions for OLS estimation.

Solution: Minimize: \(S(\alpha, \beta, \gamma) = \sum_{i=1}^n (Y_i - \alpha - \beta X_i - \gamma Z_i)^2\)

First-order conditions: \(\frac{\partial S}{\partial \alpha} = -2\sum_{i=1}^n (Y_i - \alpha - \beta X_i - \gamma Z_i) = 0\) \(\frac{\partial S}{\partial \beta} = -2\sum_{i=1}^n X_i(Y_i - \alpha - \beta X_i - \gamma Z_i) = 0\) \(\frac{\partial S}{\partial \gamma} = -2\sum_{i=1}^n Z_i(Y_i - \alpha - \beta X_i - \gamma Z_i) = 0\)

These yield the normal equations.

Problem 1.2 Prove that OLS residuals sum to zero when the model includes an intercept.

Solution: From the first-order condition: \(\sum_{i=1}^n (Y_i - \hat{\alpha} - \hat{\beta}X_i) = \sum_{i=1}^n \hat{u}_i = 0\)

Problem Set 2: Hypothesis Testing

Problem 2.1 Given regression results: \(\widehat{\log(\text{wage})} = 1.28 + 0.090 \cdot \text{educ} + 0.041 \cdot \text{exper}\) \(\text{SE}: \quad\quad\quad (0.11) \quad (0.008) \quad\quad (0.005)\) \(n = 500, \quad R^2 = 0.316\)

a) Test whether education affects wages at the 5% level. b) Construct a 95% confidence interval for the return to education. c) Test whether education and experience are jointly significant.

Solution: a) \(H_0: \beta_{educ} = 0\) vs \(H_1: \beta_{educ} \neq 0\)

\[t = \frac{0.090 - 0}{0.008} = 11.25\]

Since \(\|11.25\| > 1.96\), reject \(H_0\) at 5% level.

b) \(CI = 0.090 \pm 1.96 \times 0.008 = [0.074, 0.106]\)

c) Use F-test: \(F = \frac{R^2/k}{(1-R^2)/(n-k-1)} = \frac{0.316/2}{0.684/497} = 114.8\)

Since \(F > 3.00\) (critical value), reject joint insignificance.

Problem Set 3: Dummy Variables

Problem 3.1 Consider the wage regression: \(\text{wage}_i = \beta_0 + \beta_1 \text{female}_i + \beta_2 \text{educ}_i + \beta_3 (\text{female}_i \times \text{educ}_i) + u_i\)

Derive the returns to education for males and females.

Solution: For males (female = 0): \(\frac{\partial \text{wage}}{\partial \text{educ}} = \beta_2\)

For females (female = 1): \(\frac{\partial \text{wage}}{\partial \text{educ}} = \beta_2 + \beta_3\)

The difference in returns is \(\beta_3\).

Problem Set 4: Functional Forms

Problem 4.1 Given: \(\log(Y) = 2.5 + 0.8\log(X_1) + 0.3\log(X_2)\)

a) Interpret the coefficients b) Test whether the production function exhibits constant returns to scale

Solution: a) Coefficients are elasticities:

  • 1% increase in \(X_1\) increases \(Y\) by 0.8%
  • 1% increase in \(X_2\) increases \(Y\) by 0.3%

b) Test \(H_0: \beta_1 + \beta_2 = 1\) Sum = 0.8 + 0.3 = 1.1 Need standard error of sum to test formally.

Problem Set 5: Time Series

Problem 5.1 Test whether the AR(1) process \(Y_t = 0.95Y_{t-1} + \varepsilon_t\) is stationary.

Solution: For stationarity, need \(\|\phi\| < 1\). Since \(\|0.95\| < 1\), the process is stationary.

Exam-Style Questions

Question 1 (20 points) Consider the regression model: \(Y_i = \beta_0 + \beta_1 X_i + u_i\)

a) State the assumptions required for OLS to be BLUE. (5 points) b) Derive the OLS estimator for \(\beta_1\). (7 points) c) Show that the OLS estimator is unbiased. (8 points)

Question 2 (25 points) An economist estimates the effect of class size on test scores: \(\text{TestScore}_i = 700 - 2.5 \cdot \text{ClassSize}_i + u_i\) \((15.2) \quad (0.8)\)

a) Interpret the coefficient on ClassSize. (5 points) b) Test whether class size affects test scores at the 1% level. (10 points) c) What might cause omitted variable bias in this regression? (10 points)

Question 3 (30 points) Consider a DiD analysis of minimum wage effects:

  Pre-Period Post-Period
Treatment 85 92
Control 80 83

a) Calculate the DiD estimate. (10 points) b) State the identifying assumption. (10 points) c) How would you test this assumption? (10 points)

Practice with Software Output

Interpreting STATA Output

. reg wage educ exper female

Source |       SS           df       MS
-------+----------------------------------
Model  |  2534.567         3     844.856
Resid  |  3456.789       496     6.971
-------+----------------------------------
Total  |  5991.356       499    12.007

------------------------------------------------------
wage  |    Coef.   Std. Err.     t    P>\|t\|   [95% CI]
------+----------------------------------------------
educ  |   1.234     0.123    10.03  0.000   [0.992, 1.476]
exper |   0.456     0.045    10.13  0.000   [0.368, 0.544]
female|  -2.345     0.567    -4.14  0.000   [-3.459, -1.231]
_cons |   5.678     1.234     4.60  0.000   [3.254, 8.102]
------------------------------------------------------

Questions:

  1. Write the estimated regression equation.
  2. Test the significance of each coefficient at 5%.
  3. Calculate the R-squared.
  4. Interpret each coefficient.

Key Formulas Summary

  1. OLS Estimators:
    • Simple: \(\hat{\beta}_1 = \frac{\sum(X_i - \bar{X})(Y_i - \bar{Y})}{\sum(X_i - \bar{X})^2}\)
    • Matrix: \(\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\)
  2. Standard Errors:
    • \[\text{SE}(\hat{\beta}*j) = \sqrt{\hat{\sigma}^2[(\mathbf{X}'\mathbf{X})^{-1}]*{jj}}\]
  3. Test Statistics:
    • t-test: \(t = \frac{\hat{\beta}*j - \beta*{j,0}}{\text{SE}(\hat{\beta}_j)}\)
    • F-test: \(F = \frac{(SSR_r - SSR_{ur})/q}{SSR_{ur}/(n-k-1)}\)
  4. Goodness of Fit:
    • \[R^2 = 1 - \frac{SSR}{TSS} = \frac{ESS}{TSS}\]
    • \[\bar{R}^2 = 1 - \frac{(1-R^2)(n-1)}{n-k-1}\]
  5. DiD:
    • \[\text{DiD} = (\bar{Y}*{T,post} - \bar{Y}*{T,pre}) - (\bar{Y}*{C,post} - \bar{Y}*{C,pre})\]

Part 11: Comprehensive Review and Advanced Topics

Advanced Econometric Methods

11.1 Instrumental Variables (IV) Estimation

When we have endogeneity (\(\text{Cov}(X,u) \neq 0\)), OLS is biased. IV estimation provides a solution.

The IV Model

For the model: \(Y_i = \beta_0 + \beta_1 X_i + u_i\)

An instrument \(Z_i\) must satisfy:

  1. Relevance: \(\text{Cov}(Z,X) \neq 0\)
  2. Exogeneity: \(\text{Cov}(Z,u) = 0\)

Two-Stage Least Squares (2SLS)

Stage 1: \(X_i = \pi_0 + \pi_1 Z_i + v_i\) Stage 2: \(Y_i = \beta_0 + \beta_1 \hat{X}_i + u_i\)

The 2SLS estimator: \(\hat{\beta}_{2SLS} = \frac{\text{Cov}(Z,Y)}{\text{Cov}(Z,X)}\)

Testing Instrument Validity

  1. Relevance: F-statistic from first stage > 10 (rule of thumb)
  2. Overidentification test: With multiple instruments, test \(J\)-statistic
  3. Wu-Hausman test: Compare OLS and IV estimates

11.2 Panel Data Methods (Advanced)

Fixed Effects Estimation

Model: \(Y_{it} = \beta X_{it} + \alpha_i + u_{it}\)

Within transformation: \(\tilde{Y}*{it} = Y*{it} - \bar{Y}*i\) \(\tilde{X}*{it} = X_{it} - \bar{X}_i\)

Estimate: \(\tilde{Y}*{it} = \beta \tilde{X}*{it} + \tilde{u}_{it}\)

Random Effects Estimation

When \(\text{Cov}(\alpha_i, X_{it}) = 0\), use GLS: \(Y_{it} = \beta X_{it} + \alpha_i + u_{it}\)

Hausman Test

Tests \(H_0\): Random effects is consistent and efficient \(H = (\hat{\beta}*{FE} - \hat{\beta}*{RE})'[\hat{V}*{FE} - \hat{V}*{RE}]^{-1}(\hat{\beta}*{FE} - \hat{\beta}*{RE})\)

Under \(H_0\): \(H \sim \chi^2_k\)

11.3 Limited Dependent Variables

Probit Model

\[P(Y_i = 1\|X_i) = \Phi(\beta_0 + \beta_1 X_i)\]

where \(\Phi(\cdot)\) is the standard normal CDF.

Marginal effects: \(\frac{\partial P(Y_i = 1\|X_i)}{\partial X_i} = \phi(\beta_0 + \beta_1 X_i) \cdot \beta_1\)

Logit Model

\[P(Y_i = 1\|X_i) = \frac{e^{\beta_0 + \beta_1 X_i}}{1 + e^{\beta_0 + \beta_1 X_i}}\]

Log-odds interpretation: \(\log\left(\frac{P}{1-P}\right) = \beta_0 + \beta_1 X_i\)

Tobit Model

For censored data: \(Y_i^* = \beta_0 + \beta_1 X_i + u_i\) \(Y_i = \max(0, Y_i^*)\)

11.4 Advanced Time Series

ARIMA Models

ARIMA(p,d,q): \((1-\phi_1L-...-\phi_pL^p)(1-L)^d Y_t = (1+\theta_1L+...+\theta_qL^q)\varepsilon_t\)

Vector Error Correction Model (VECM)

For cointegrated series: \(\Delta \mathbf{Y}*t = \mathbf{\Pi Y}*{t-1} + \sum_{i=1}^{p-1} \mathbf{\Gamma}*i \Delta \mathbf{Y}*{t-i} + \mathbf{u}_t\)

where \(\mathbf{\Pi} = \mathbf{\alpha \beta}'\) (error correction term)

11.5 Quantile Regression

Instead of conditional mean, estimate conditional quantiles: \(Q_\tau(Y\|X) = X\beta(\tau)\)

Minimize: \(\sum_{i:Y_i \geq X_i\beta} \tau\|Y_i - X_i\beta\| + \sum_{i:Y_i < X_i\beta} (1-\tau)\|Y_i - X_i\beta\|\)

Applications: Wage distributions, risk analysis

Comprehensive Review Questions

Section 1: Conceptual Understanding

  1. Causal Inference
    • What is the fundamental problem of causal inference?
    • How does randomization solve the selection problem?
    • What are the key assumptions for causal interpretation of regression coefficients?
  2. Model Specification
    • What are the consequences of omitting a relevant variable?
    • How do you test for functional form misspecification?
    • When should you use logs vs. levels?
  3. Statistical Inference
    • Explain the difference between consistency and unbiasedness
    • What are the consequences of heteroskedasticity?
    • How do clustered standard errors differ from robust standard errors?

Section 2: Applied Problems

Problem 1: Returns to Education You want to estimate the causal effect of education on wages.

a) Write down a regression model b) What are potential sources of endogeneity? c) Suggest an instrumental variable and justify its validity d) How would you test whether IV is necessary?

Solution Outline: a) \(\log(\text{wage}_i) = \beta_0 + \beta_1 \text{educ}_i + \beta_2 \text{exper}_i + u_i\)

b) Ability bias, measurement error, reverse causality

c) Quarter of birth (Angrist-Krueger), distance to college

  • Relevance: Affects education through compulsory schooling
  • Exogeneity: Randomly assigned, shouldn’t affect wages directly

d) Wu-Hausman test comparing OLS and IV estimates

Problem 2: Policy Evaluation A state implements a job training program. You have data before and after for treatment and control states.

a) Set up a DiD model b) State the identifying assumption c) How would you test this assumption? d) What if treatment timing varies?

Solution Outline: a) \(Y_{st} = \beta_0 + \beta_1 \text{Post}_t + \beta_2 \text{Treat}_s + \beta_3 (\text{Post}_t \times \text{Treat}*s) + u*{st}\)

b) Parallel trends assumption

c) Event study specification, test pre-trends

d) Use staggered DiD with unit and time fixed effects

Section 3: Data Analysis Project

Project: Housing Prices Analysis

Dataset includes:

  • House prices
  • Square footage, bedrooms, bathrooms
  • Neighborhood characteristics
  • School quality measures
  • Crime rates

Tasks:

  1. Explore functional form (linear vs. log)
  2. Test for spatial dependence
  3. Address potential endogeneity of school quality
  4. Implement a hedonic price model
  5. Forecast prices for new listings

Common Econometric Pitfalls

  1. P-hacking: Testing multiple specifications until finding significance
  2. Ignoring clustered errors: Underestimating standard errors
  3. Bad controls: Controlling for outcomes of treatment
  4. Extrapolation: Predictions outside the support of data
  5. Reverse causality: Ignoring feedback effects
  6. Sample selection bias: Non-random sampling
  7. Measurement error: Attenuation bias
  8. Multicollinearity: Inflated standard errors
  9. Overfitting: Too many parameters relative to observations
  10. Ignoring dynamics: Static models for dynamic processes

Software Implementation Guide

STATA Commands Reference

* Basic regression
reg y x1 x2, robust
reg y x1 x2, cluster(group)

* Panel data
xtset id time
xtreg y x1 x2, fe
xtreg y x1 x2, re
hausman fe re

* IV regression
ivregress 2sls y x1 (x2 = z1 z2)
ivreg2 y x1 (x2 = z1 z2), first

* Time series
tsset time
arima y, arima(1,1,1)
dfuller y, trend
vec y1 y2 y3

* Limited dependent variables
probit y x1 x2
logit y x1 x2
tobit y x1 x2, ll(0)

* Difference-in-differences
gen post = (time >= 2000)
gen treat_post = treat * post
reg y treat post treat_post, cluster(state)

R Commands Reference

# Basic regression
lm(y ~ x1 + x2, data=df)
coeftest(model, vcov=vcovHC(model, type="HC1"))

# Panel data
library(plm)
fe_model <- plm(y ~ x1 + x2, data=df, model="within")
re_model <- plm(y ~ x1 + x2, data=df, model="random")
phtest(fe_model, re_model)

# IV regression
library(AER)
ivreg(y ~ x1 + x2 \| x1 + z1 + z2, data=df)

# Time series
library(forecast)
auto.arima(y)
adf.test(y)

# Limited dependent variables
glm(y ~ x1 + x2, family=binomial(link="probit"))
glm(y ~ x1 + x2, family=binomial(link="logit"))

Final Exam Preparation Checklist

Theory

  • Classical linear model assumptions
  • Properties of OLS estimators
  • Gauss-Markov theorem
  • Hypothesis testing framework
  • Maximum likelihood estimation
  • Asymptotic theory

Methods

  • Multiple regression
  • Dummy variables and interactions
  • Functional forms
  • Heteroskedasticity and robust inference
  • Time series basics
  • Panel data methods
  • Instrumental variables
  • Difference-in-differences

Applications

  • Interpret regression output
  • Choose appropriate functional form
  • Test model assumptions
  • Handle common data issues
  • Write up empirical results
  • Critique empirical papers

Software Skills

  • Data cleaning and manipulation
  • Running regressions
  • Diagnostic tests
  • Creating tables and graphs
  • Interpreting output

Summary

Econometrics provides tools for:

  1. Estimating causal effects
  2. Testing economic theories
  3. Forecasting future values
  4. Policy evaluation

Key takeaways:

  • Correlation ≠ causation
  • Careful identification is crucial
  • Always check assumptions
  • Consider economic theory
  • Be transparent about limitations
  • Use appropriate methods for your data
  • Interpret results in context

Remember: “All models are wrong, but some are useful.” - George Box

25

25
Ready to start
Econometrics
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions