The Singular Value Decomposition is an important matrix operation which enables many other numerical algorithms. The SVD lets you tame seemingly unwieldy matrices by uncovering their reduced "low rank" representation. A matrix which can be accurately approximated by a low-rank decomposition actually contains much less information than suggested by its dimensions.
A rank-1 matrix you might see often in the wild is an elementary school multiplication table. The multiplication table contains nxn entries which are intimately intertwined (i.e. one entry can't be changed independent of the others). A singular value decomposition of a multiplication table would reveal that the essential information is just the sequence of numbers 1, 2, ..., n (whose outer product reconstructs the table).
This kind of data compression can be very useful in machine learning, where we prefer to learn predictors which use a small of informative features (rather than thousands of highly correlated features). In addition to letting us boil down redundant features into more informative combinations, the SVD can also be used for denoising samples and imputing missing data.
What does the output of an SVD look like? From a given input matrix X
, you get back a triplet U, s, V
. The two matrices (U
and V
) often called the left and right singular vectors. The outer products of these singular vectors are rank-1 building block from which X
gets reconstructed. The vector s
contains singular values, each of which acts as a weight on a combination singular vectors.
To reconstruct your original matrix just add up the outer products of U
and V
, weighted by the entries of s
. That is, you can get back a matrix whose entries are approximately equal to the original by running:
np.sum([np.outer(U[:, i], V[i, :]) * si for i, si in enumerate(s)], axis=0)
Alternatively, you can upgrade the vector s
into a diagonal matrix and express the reconstruction of X
purely through matrix multiplications:
np.dot(U, np.dot(np.diag(s), V))
In many cases, when we are dealing with low-rank underlying data, many of the singular values will be extremely small or exactly zero. In this case, it is wasteful to perform the "full" SVD and can be advantageous to instead do a "truncated" decomposition (computing only k singular vectors).
Exactly how much slower is a full SVD vs. a truncated SVD? I performed the following quick experiment to find out:
- Generate 100 random matrices with rank 50, 1000 columns and a random number of rows between 100 and 5000
- Record how long it takes to decompose the matrix using one of:
- a full SVD with numpy.linalg.svd (which uses LAPACK under the hood)
- an exact low-rank SVD with sklearn.decomposition.TruncatedSVD using ARPACK
- an approximate low-rank SVD also with sklearn.decomposition.TruncatedSVD but using a faster randomized solver
- Ensure that the reconstructed matrix matches the original
Results
The truncated SVD (using either an exact solver or approximate randomized solver) can be many times faster than a full SVD. If you're decomposing a large dataset with known low rank, go ahead and use a truncated SVD. On the other hand, if you're uncertain about the rank of your matrix (and want to inspect a large spectrum of singular values), you'll have to wait for a full SVD instead.