The repo contains assignment solutions for Big Data Programming 1 course at Simon Fraser University.
- Word Count MapReduce
- Mapreduce program to count the occurences of each word in a large text corpus.
- Reddit Average MapReduce
- To parse JSON input and calculate average score for each subreddit using MapReduce.
- Most-viewed Wikipedia pages MapReduce
- MapReduce program that finds the number of times the most-visited page was visited each hour.
- Word Count PySpark
- Count the occurences of each word in a large text corpus using PySpark.
- Most-viewed Wikipedia Pages PySpark
- PySpark program that finds the most-visited page and the number of times it was visited each hour.
- Reddit Average PySpark
- Calculate average score for each subreddit by parsing json input
- Word Count Improved
- Improving the performance of word count program.
- Reddit ETL
- Performing extract transform load operations on reddit comments for further processing.
- Reddit relative score
- To find the best comment on Reddit by calculating relative subreddit scores.
- Reddit relative score using Broadcast
- To find the best comment on Reddit using broadcast join.
- Weather ETL
- To perform ETL operations on weather dataset for further processing.
- Hourly popular Wikipedia Pages
- To find the most-viewed page on wikipedia every hour with count using broadcast .
- Temperature Range
- To find the temperature range using Python API.
- Temperature range Spark SQL
- To find the temperature range using Spark SQL.
- Logs correlation using RDD
- Calculating correlation using RDD functions.
- Djikstra's Algorithm
- Finding shortest path between nodes using Djikstra's Algorithm.
- Load logs to Cassandra
- Inserting data into Cassandra table using batch statements.
- Load logs to Cassandra using Spark
- Inserting data into Cassandra table using Spark-Cassandra connector.
- Logs Correlation Cassandra
- Finding correlation on data read from Cassandra.
- Kafka Stream read
- Reading data from kafka stream to learn weights using simple linear regression.
- Colour Prediction MLlib
- Classification of colours using Multi Layer Perceptron Classifier.
- Weather prediction MLlib
- Predicting maximum temperature of a future date using Gradient Boosting Regressor.