Tutorial Overview

  • Introduce attendees not familiar with Principal Component Analysis (PCA) to the commonly used dimension reduction approach.
  • Focus on real data analysis with R and Python.

Tutorial Prerequisites

Attendees are expected to have a working knowledge of statistics or linear algebra and R or Python.

Tutorial Outline

  • Motivation for PCA
  • Overview of PCA (e.g. checking of assumptions, deciding how many components to use, etc.)
  • Two applications of PCA
  • Extensions of PCA

Running the Examples

Software Prerequisites

Installation of R/Python and the following R/Python packages:

  • R: lattice, blockcluster (optional), RTextTools (optional), RJSONIO (optional), httr (optional)
  • Python: matplotlib, numpy, os, pandas, scipy, sklearn

If scraping Meetup data from scratch:

Running the examples

  • Clone the repository and navigate to 'Code' folder in the Terminal.
  • Example 1: Run the R/Python code as follows:
# In R:
source("Example1.R")
# In Python:
execfile("Example1.py")
  • Example 2: Run the R/Python code as follows:
# In R:
source("Example2.R")
# In Python:
execfile("Example2.py")