Py-Spark implementation of the 6th chapter of the book "Advanced Analytics with Spark: Patterns for Learning from Data at Scale" (Uri Laserson, Sean Owen, Sandy Ryza, Josh Wills), originally implemented in Scala. The goal is to apply LSA (Latent Semantic Analysis) to a corpus of Wikipedia articles. In order to do this, we employ the Wikipedia Data Dumps dataset.