/Task_pipeline

Assignment 4 for the big data lab offered at IITM

Primary LanguageJupyter Notebook

Big data CS5830 Course, IITM, Spring 2024.

Setup: create a venv and install the required packages from the requirements.txt Clone the repo, and mention the year and dataset name in the params.yaml file, the results are stores in Results directory

Here is a short description on the project
Data: Has the downloaded datasets
Ground_Truth_Dats: has the monthly ground truth data and extracted field names pickle file
Computed_Averages: Has the .csv files of computed monthly ground truth data from daily data
Results: the final results
Src: All the source files

The data and the pipelines are tracked by dvc