market_share_predictor

Summary of work

To keep track of the work please review the notebooks in the following order:

1. Dats_Analysis.ipynb :

Analysing data and deciding which unnecessary features to drop

2. Train_Model.ipynb :

Preprocessing the train data and evaluate different models (using 70% of train data for training and the rest 30% for testing the model) and compared their r2 score and mae to choose the best. as a result, the ensemble model RandomForestRegressor is the best model for us.

result table :

Model	R2 Score	MAE	CV R2-Score
RandomForestRegressor	0.9006	0.9263	0.8946 (+/-) 0.0015

3. Test_Prediction.ipynb :

Preprocessing train and test data, training the RandomForestRegressor model on train data and then predict the test data.

the final prediction is saved in test_prediction.csv file.

Requirments :

sklearn
pandas
numpy
matplotlib

parastooAflaki/market_share_predictor