Sparse Matrices and Truncated SVD

Sparse Matrix

A sparse matrix is a matrix where most entries are zero.

Basket-by-item data are usually sparse because each basket contains only a small fraction of all products.

Dense Storage

A dense matrix stores every cell:

\[n_{rows} \times n_{columns}\]

This is inefficient when most cells are zero.

Sparse Storage

A sparse matrix stores mainly the nonzero entries and their positions.

Storage is closer to:

\[O(\text{number of nonzero entries})\]

instead of:

\[O(n_{rows}n_{columns})\]

Basket-Item Matrix

For basket $i$ and item $j$:

\[x_{ij} = \text{quantity of item } j \text{ in basket } i\]

or for item presence:

\[x_{ij} = 1\{\text{item } j \text{ appears in basket } i\}\]

Most $x_{ij}$ values are zero.

SVD

Singular value decomposition factors a matrix $X$ as:

\[X = UDV^T\]

where:

$U$ contains left singular vectors
$D$ contains singular values
$V$ contains right singular vectors

PCA can be computed using SVD on centered/scaled data.

Truncated SVD

Truncated SVD computes only the first $k$ singular vectors instead of the full decomposition.

This is useful when:

the matrix is large
only the leading dimensions are needed
the matrix is sparse

Why It Matters

For high-dimensional retail data, full dense PCA can be memory-expensive.

Sparse matrices plus truncated SVD allow the analysis to focus on leading structure without materializing all zeros as explicit values.

Sparse Matrices and Truncated SVD

Sparse Matrix

Dense Storage

Sparse Storage

Basket-Item Matrix

SVD

Truncated SVD

Why It Matters

See

Principal Component Analysis

Correspondence Analysis

Linear Algebra

Sparse Matrices and Truncated SVD

Sparse Matrix

Dense Storage

Sparse Storage

Basket-Item Matrix

SVD

Truncated SVD

Why It Matters

See

Principal Component Analysis

Correspondence Analysis

Linear Algebra

Sessions by Day

Productivity by Hour

Session Completion Rate

Time Spent by Task

Sessions by Day of Week

Session Duration Distribution