edx-Introduction-to-Big-Data-with-Apache-Spark

Lab 1: Word Count example using Spark

This exercise consists of 4 parts:
Part 1: Creating a base RDD and pair RDDs
Part 2: Counting with pair RDDs
Part 3: Finding unique words and a mean value
Part 4: Apply word count to a file

Lab 2: Web Server Log Analysis with Apache Spark

This exercise consists of 4 parts:

Part 1: Apache Web Server Log file format
Part 2: Sample Analyses on the Web Server Log File
Part 3: Analyzing Web Server Log File
Part 4: Exploring 404 Response Codes

Lab 3: Text Analysis and entity resolution

This exercise consists of 5 parts and quiz questions:

Part 1: ER as Text Similarity - Bags of Words
Part 2: ER as Text Similarity - Weighted Bag-of-Words using Term-Frequency/Inverse-Document-Frequency
Part 3: ER as Text Similarity - Cosine Similarity
Part 4: Scalable ER
Part 5: Analysis (this is part where you will click through and view plots of your work from part 4)

Lab 4: Movie Recommendations using Apache Spark

This exercise consists of 3 parts and quiz questions:

Part 1: Basic Recommendations
Part 2: Collaborative Filtering
Part 3: Predictions for Yourself (this is part where you will enter your own ratings and see what movies are recommended for you)