This repository contains an implementation of the principal component analysis in scala and spark. The PCA was part of an easy face-detector trained on the faces in the wild dataset. This implementation was part of a lecture in big-data analytics where a final project with free choice of the topic and used programming languages was mandatory. The implementation was constrained to run on the university cluster running the cloudera distribution of Spark in the version 1.6. This version was old at the time of the project and did not provide any functions to load and decode images. Therefore, the images where converted to grayscale and stored as csv-files with a python script and then loaded as text-files.
Check the file main.scala for the implementation.