EL9123 Homework 9: Principle Component Analysis and Feature Dimension Reduction solved

$35.00

Category: You will receive a download link of the .ZIP file upon Payment

Description

5/5 - (1 vote)

1. Assume that you have 4 samples each with dimension 3, described in the data matrix X,
X =




3 2 1
2 4 5
1 2 3
0 2 5




(a) Find the sample mean.
(b) Find the sample covariance matrix Q.
(c) Find the eigenvalues and eigenvectors. You can use the numpy.linalg.eig to compute
eigenvalues and eigenvectors from Q.
(d) Find the PCA coefficients corresponding to samples in X.
(e) Reconstruct the original samples from the PCA coefficients.
(f) Approximate the samples using principle components corresponding to the two largest
eigenvalues.
(g) Verify the sum of reconstruction error squares = sum of squares of skipped PCA coefficients.
You could do the above calculation using a simple python code.
2. Show that PCA serves the purpose of decorrelating the elements in the original samples.
That is, the covariance matrix of the coefficient data is diagonal, and that the variance of the
coefficients are equal to the eigenvalues.
3. For machine learning, we often transform original sample features using PCA. Is it beneficial
if you keep all the coefficients? What is the benefit if you keep a subset of coefficients? What
coefficients should you keep?
4. What is the problem of using the PCA coefficients directly as the transformed features for
machine learning? How should you fix that?
1