A repository for the semester project of the Advanced Databases class (9th semester, ECE NTUA).
For this project, instructions on how to setup the working environment can be found here. We used Hadoop 3.3.6 and Spark 3.5.0.
The report of this project is in the file adv_db_report.pdf which contains instructions on how to setup the repo and run the scripts. The project specifications and requirements are in the file advanced_db_project.pdf.
In the directory data, you can find the datasets used for the projct. In the directory scripts, you can find the spark scripts for each query. In the directory join-plans, you can find the logical plans (graph & text format) of the joins in queries 3 & 4, for the different join strategies used.