ONEDOT DATA ENGINEERING TASK

Required Libraries and Setting up the Envioronment

pip install libraries

In the same folder with supplier_car.json

python main.py

Data Files :
- supplier_car.json : initial supplier data
- Target Data.xlsx: target data
- pre-processing.csv : pre-processed version of supplier data
- normalisation.csv : normalized version of pre-processed data
- extraction.csv : extracted version of normalised data
- integration.csv : integrated version of extracted data (This is the final form of the data)
Python files :
- main.py : pipeline needed to be executed
- Classes:
  - preprocessor.py : Class with functions for preprocessing
  - normalizer.py : Class with functions for normalizing
  - extractor.py : Class with functions for extraction
  - integrator.py : Class with functions for integration
Presentation files :
data_flow_presentation.pdf : Data flow presentation and explanations. Answer of part 5 can be found in last 2 slides.