/Big-Data-Analysis-with-Scala-and-Spark

Projects as part of coursework on Big Data with Scala and Spark

Primary LanguageScala

Big-Data-Analysis-with-Scala-and-Spark

Projects as part of coursework on Big Data with Scala and Spark

Big Data Analysis with Scala and Spark

This repository contains my submissions for the Coursera MOOC on "Big Data Analysis with Scala and Spark," offered by EPFL and instructed by Prof. Heather C. Miller.

Assignments

Assignment 1: Wikipedia

  • Topic: Basics of Spark's RDDs

  • Objective: This assignment involves analyzing Wikipedia data to create a simple metric for programming language popularity, comparing it to the popular Red Monk rankings.

Assignment 2: StackOverflow

  • Topic: Reduction Operations & Distributed Key-Value Pairs

  • Objective: Develop a distributed k-means algorithm to cluster StackOverflow posts based on their score, considering different programming languages, and then comparing the clustering results.

Assignment 3: Time Usage

  • Topic: SQL, Dataframes, and Datasets

  • Objective: Identify three activity groups (primary needs, work, and other/leisure) and observe how people allocate their time among these groups. Analyze differences between demographic groups such as gender, employment status, and age.