/Big-Data-Programming-1

This repo contains assignment solutions for Big Data Programming 1 course at Simon Fraser University

Primary LanguagePython

Big-Data-Programming

The repo contains assignment solutions for Big Data Programming 1 course at Simon Fraser University.

Assignment List

  1. Word Count MapReduce
  • Mapreduce program to count the occurences of each word in a large text corpus.
  1. Reddit Average MapReduce
  • To parse JSON input and calculate average score for each subreddit using MapReduce.
  1. Most-viewed Wikipedia pages MapReduce
  • MapReduce program that finds the number of times the most-visited page was visited each hour.
  1. Word Count PySpark
  • Count the occurences of each word in a large text corpus using PySpark.
  1. Most-viewed Wikipedia Pages PySpark
  • PySpark program that finds the most-visited page and the number of times it was visited each hour.
  1. Reddit Average PySpark
  • Calculate average score for each subreddit by parsing json input
  1. Word Count Improved
  • Improving the performance of word count program.
  1. Reddit ETL
  • Performing extract transform load operations on reddit comments for further processing.
  1. Reddit relative score
  • To find the best comment on Reddit by calculating relative subreddit scores.
  1. Reddit relative score using Broadcast
  • To find the best comment on Reddit using broadcast join.
  1. Weather ETL
  • To perform ETL operations on weather dataset for further processing.
  1. Hourly popular Wikipedia Pages
  • To find the most-viewed page on wikipedia every hour with count using broadcast .
  1. Temperature Range
  • To find the temperature range using Python API.
  1. Temperature range Spark SQL
  • To find the temperature range using Spark SQL.
  1. Logs correlation using RDD
  • Calculating correlation using RDD functions.
  1. Djikstra's Algorithm
  • Finding shortest path between nodes using Djikstra's Algorithm.
  1. Load logs to Cassandra
  • Inserting data into Cassandra table using batch statements.
  1. Load logs to Cassandra using Spark
  • Inserting data into Cassandra table using Spark-Cassandra connector.
  1. Logs Correlation Cassandra
  • Finding correlation on data read from Cassandra.
  1. Kafka Stream read
  • Reading data from kafka stream to learn weights using simple linear regression.
  1. Colour Prediction MLlib
  • Classification of colours using Multi Layer Perceptron Classifier.
  1. Weather prediction MLlib
  • Predicting maximum temperature of a future date using Gradient Boosting Regressor.