Sparse Matrices and Truncated SVD
Sparse Matrix
A sparse matrix is a matrix where most entries are zero.
Basket-by-item data are usually sparse because each basket contains only a small fraction of all products.
Dense Storage
A dense matrix stores every cell:
\[n_{rows} \times n_{columns}\]This is inefficient when most cells are zero.
Sparse Storage
A sparse matrix stores mainly the nonzero entries and their positions.
Storage is closer to:
\[O(\text{number of nonzero entries})\]instead of:
\[O(n_{rows}n_{columns})\]Basket-Item Matrix
For basket $i$ and item $j$:
\[x_{ij} = \text{quantity of item } j \text{ in basket } i\]or for item presence:
\[x_{ij} = 1\{\text{item } j \text{ appears in basket } i\}\]Most $x_{ij}$ values are zero.
SVD
Singular value decomposition factors a matrix $X$ as:
\[X = UDV^T\]where:
- $U$ contains left singular vectors
- $D$ contains singular values
- $V$ contains right singular vectors
PCA can be computed using SVD on centered/scaled data.
Truncated SVD
Truncated SVD computes only the first $k$ singular vectors instead of the full decomposition.
This is useful when:
- the matrix is large
- only the leading dimensions are needed
- the matrix is sparse
Why It Matters
For high-dimensional retail data, full dense PCA can be memory-expensive.
Sparse matrices plus truncated SVD allow the analysis to focus on leading structure without materializing all zeros as explicit values.