/TLC-NYC-Big-Data-Analytics

Semester project for the Advanced Database Systems Course @ NTUA ECE 2022-2023

Primary LanguagePython

Advanced Topics in Database Systems ~ ECE NTUA 2022-2023

Name Εmail AM
Andreas Chrysovalantis-Konstantinos el18102@mail.ntua.gr 031 18 102
Papanikolaou Ioannis el18064@mail.ntua.gr 031 18 064

  • The python files to run all the queries are in the scripts folder.

  • Installation instructions are at report. These instructions explain how we created a cluster network in Okeanos-Knosos compromised of 2 VMs to run the experiments needed for the completion of the project.

  • Query results are in the output.txt file.

  • Query execution times are located in query_exec_times.txt file.


For the implementation of the current project, taxi route data (TLC) of New York for the period 1/22 - 6/22
were used. The data is publicly available here.


Useful Informations - Versions

  1. Hadoop 2.7.7
  2. Python 3.8.0
  3. Spark 3.1.3
  4. openjdk 1.8.0_292