1. Singular Value Decomposition (SVD)• What if we had a bunch of data and we didn't really know much about it?– We'd like to take the data and look for patterns in it and separate them out → so that we could understand our data better.– We can use SVD to do this.• SVD states that any matrix can be represented by three different matrices as follows,• A = U𝛴VT• U → rotation• 𝛴 → scaling• VT → final rotation• For Example[a
1
-1
2
3
2
-2
]=[a
-0.24
0.96
0.96
0.24
][a
4.2
0
0
0
2.2
0
][a
0.63
0.58
-0.57
0.74
-0.2
0.63
-0.2
0.82
0.51
]• Note: If we divide each diagonal element of 𝛴 by the sum all elements in the diagonal, we get percentage of the variance explained by corresponding column in the U matrix. – In the example above, the variance explained by first column of U, [a
-0.24
0.96
], is equal to 4.2
4.2+2.2=0.65.• Note: The third column of 𝛴 and third row of VT are not used.1.1. Eigendecomposition• Eigendecomposition states that any square matrix can be broken down into eigenvectors and eigenvalues.• Few problems with eigendecomposition:– It only works on square matrices.– The eigenvalues don't necessarily lie between 0 and 1. – The ranks of eigenvectors are not perpendicular.• SVD solves these problem by:– Allowing any sort of matrix (not only limited to square matrices)– 𝛴 is eigenvalues of AAT → This allows these values to lie between 0 and 1.– VT is just the eigenvectors of ATA.– To get the values of U, we can simply solve this equation → ui=Avi
𝛴• We can think of SVD as a generalized version of eigendecomposition.1.2. Principal Component Analysis (PCA)• The eigendecomposition of matrix A is,A = VLVT• • Now, what if we take matrix A and standardize it (i.e. subtract the mean and divide it by the standard deviation) and then divide it N-1 → This means that we have a correlation matrix → ATA
N-1– The problem is that this computation is typically not stable.– So, instead, what's typically done to get PCA is to use SVD on the standardized matrix A.* In this case, the U𝛴 term → Principal Components• Note: SVD and PCA can be used for dimensionality reduction.• Note: SVD and PCA assume a linear correlation between the features.– There are non-linear dimensionality reduction techniques. Examples of such methods are Kernel PCA.Back to Top