PCA - Recap

Key Takeaways

The key takeaways from this section include:

  • PCA is an unsupervised learning technique which does not require labeled data
  • It is also a dimensionality reduction technique which can be used to compress data and experiment with its effects on machine learning algorithms as a preprocessing step
  • There are four steps to conducting PCA:
    • Center each feature by subtracting the feature mean
    • Calculate the covariance matrix for your normalized dataset
    • Calculate the eigenvectors/eigenvalues for the covariance matrix
      • Reorder your eigenvectors based on their accompanying eigenvalues (in descending order of the eigenvalues)
    • Take the dot product of the transpose of the eigenvectors with the transpose of the normalized data
  • You can also easily implement PCA using scikit-learn