/mhlangana

Map-Reduce and SparkSQL queries, Advanced Topics in Database Systems, ECE-NTUA 2020-2021

Primary LanguagePythonMIT LicenseMIT

Mhlangana

Advanced Topics in Database Systems, ECE-NTUA 2020-2021
Implementation of Map-Reduce (Spark RDD API) and SparkSQL queries

Included

  • atds_project_report.pdf: A report containing pseudocode Map-Reduce implementations.
  • code: Query implementation using pyspark, python scripts for plotting the results (plot_queries.py, barplot.py).
  • output: Query output.

Plots

Query Execution Times

Query Execution Times

Broadcast Join vs Repartition Join

Broadcast Join vs Repartition Join

SparkSQL Join Optimizer

SparkSQL Join Optimizer

Contributors