- Claudia Chianella (@clauchian)
- Yannick Giovanakis (@yangvnks)
- Flavio Primo (@flaprimo)
- Francesco Zinnari (@FrancescoZinnari)
Below are provided the steps that were followed for this project. Each step and classifiers have their own document.
- Load Dataframes: load the csv data into Pandas Dataframes and save it for quick use later
- Data visualization: data analysis to understand missing values, data relations and usefulness of features
- Preprocessing: with the knowledge acquired with the preceding step, apply preprocessing of data including dealing with missing values and build new features
- Ensemble: build the model to predict NumberOfSales on test set
\
contains all of the jupyter's notebooks including models, preprocessing and data visualization\Data\input
contains the original dataset provided by Bip.\Data\output
to save the outputs given by intermediate steps (load-dataframes, preprocessing) as well as the final prediction given by the model for the test set (submission.csv
- Install Python and clone this repository
- Install required Python modules with
pip install -r requirements.txt
to run the jupyter's notebooks just go with jupyter notebook
To run the notebooks follow these steps:
- Execute load-dataframes notebook
- Execute data-visualization notebook
- Execute preprocessing notebook
- Execute Ensemble notebook
Intermediate results obtained by the various notebooks were not saved in the current repository. If you want to try the code you have to execute it from the beginning!