Big data CS5830 Course, IITM, Spring 2024.
Setup: create a venv and install the required packages from the requirements.txt Clone the repo, and mention the year and dataset name in the params.yaml file, the results are stores in Results directory
Here is a short description on the project
Data: Has the downloaded datasets
Ground_Truth_Dats: has the monthly ground truth data and extracted field names pickle file
Computed_Averages: Has the .csv files of computed monthly ground truth data from daily data
Results: the final results
Src: All the source files
The data and the pipelines are tracked by dvc