This document describes the methodology for importing data from a SQL database to Python environment, to conduct analysis, deliver Machine Learning predictive models and perform optimization on that model to improve the energy consumption of the systems under study.
Further documentation on the method can be found in this blogpost.
To download the code run:
git clone git://github.com/diogoncalves/NJOULES.git
The training and test data can be downloaded from the Kaggle competition webpage at this link. The two extracted files train.csv
and test.csv
should be placed in the data
folder.
Note: the benchmark sample solution and code provided by Kaggle are not required.
Download relevant data from the MySQL database to your environment, from a specified time window. Divide the database in small datasets, deal with the NaN
entries, remove the outliers (training data only) by running:
python data_preprocessing.py
This will also create three additional train
, valid
, and test
folders.
The size of the validation subset, the time window, the relevant variables, the outlier thresholds value can be changed in the above Python script.