Loading graph…

Correspondence Analysis

Definition

Correspondence analysis is a dimension-reduction method for contingency tables.

It is similar in spirit to PCA, but it uses chi-square geometry instead of ordinary Euclidean geometry on raw numeric variables.

Contingency Table

Let $N$ be a nonnegative table with rows $i$ and columns $j$.

In basket analysis, a useful table is:

  • rows = products
  • columns = baskets
  • cell $n_{ij} = 1$ if product $i$ appears in basket $j$

Masses

Let:

\[n = \sum_i \sum_j n_{ij}\]

The correspondence matrix is:

\[P = \frac{N}{n}\]

Row masses:

\[r_i = \sum_j p_{ij}\]

Column masses:

\[c_j = \sum_i p_{ij}\]

Profiles

A row profile converts a row into conditional proportions:

\[\frac{p_{ij}}{r_i}\]

It describes the distribution of a row across columns.

For products, a row profile describes the basket pattern of a product.

Chi-Square Geometry

Correspondence analysis compares profiles with chi-square distance.

For two row profiles $i$ and $i’$:

\[d^2(i,i') = \sum_j \frac{1}{c_j}\left(\frac{p_{ij}}{r_i} - \frac{p_{i'j}}{r_{i'}}\right)^2\]

Differences on rare columns get more weight than differences on common columns.

Standardized Residual Matrix

Correspondence analysis decomposes:

\[S = D_r^{-1/2}(P - rc^T)D_c^{-1/2}\]

where:

  • $D_r$ is the diagonal matrix of row masses
  • $D_c$ is the diagonal matrix of column masses
  • $rc^T$ is the independence model

Then it applies SVD to $S$.

Inertia

Inertia is the correspondence-analysis version of variance.

Each dimension has inertia:

\[\lambda_k = s_k^2\]

where $s_k$ is a singular value.

Explained inertia is:

\[\frac{\lambda_k}{\sum_m \lambda_m}\]

See

25

25
Ready to start
Correspondence Analysis
Session: 1 | Break: Short
Today: 0 sessions
Total: 0 sessions