The key takeaways from this section include:
- PCA is an unsupervised learning technique which does not require labeled data
- It is also a dimensionality reduction technique which can be used to compress data and experiment with its effects on machine learning algorithms as a preprocessing step
- There are four steps to conducting PCA:
- Center each feature by subtracting the feature mean
- Calculate the covariance matrix for your normalized dataset
- Calculate the eigenvectors/eigenvalues for the covariance matrix
- Reorder your eigenvectors based on their accompanying eigenvalues (in descending order of the eigenvalues)
- Take the dot product of the transpose of the eigenvectors with the transpose of the normalized data
- You can also easily implement PCA using scikit-learn