/aldiplusplus

This repository is the official implementation of ALDI++: Automatic and parameter-less discorddetection for daily load energy profiles

Primary LanguageJupyter NotebookMIT LicenseMIT

ALDI++: Automatic and parameter-less discord detection for daily load energy profiles

MIT license Python Version arXiv

Initial codebase: https://github.com/intelligent-environments-lab/ALDI

This repository is the official implementation of ALDI++: Automatic and parameter-less discord detection for daily load energy profiles.

Requirements

Local

To run locally, you can execute the current environments:

conda env create --file env/environment_<OS>.yaml # replace OS with either `macos` or `ubuntu`

AWS

For the forecasting portion of this project (training and prediction), we recommend using the following EC2 instance which was used in our experiments:

  • Instance Type: g4dn.4xlarge (16 vCPUs, 64 GB RAM, and 600 GB disk)
  • AMI: Deep Learning AMI (Ubuntu 18.04)
  • Conda environment tensorflow2_p36

For the forecasting portion of this project, we recommend using the following EC2 instance which was used in our experiments:

  • Instance Type: g4dn.4xlarge (16 vCPUs, 64 GB RAM, and 600 GB disk)
  • AMI: Deep Learning AMI (Ubuntu 18.04)
  • Conda environment tensorflow2_p36

Data

We chose the following publicly available:

And specifically, the subset used for the Great Energy Predictor III (GEPIII) machine learning competition.

Download the datasets from the competition's data tab into data/.

The manually labeled outliers, from the top winning teams, are extracted from the following resources:

Then, run the notebook bad_meter_preprocessing.ipynb to create the labeled train set.

Benchmarking models

  • Statistical model (2-Standard deviation)
  • ALDI
  • Variational Auto-encoder (VAE)
  • ALDI++ (our method)

Evaluation

Discord classification

Confusion matrices and ROC-AUC metrics are evaluated using the following notebooks:

classification_<model>.ipynb

where <model> is one of the benchmarked models: 2sd, vae, aldi, aldipp

Energy Forecasting

To specify different settings and parameters pertinent to the data pre-processing, training, and evaluation, modify the files inside the configs/ folder as a yaml file. The pipeline used for energy forecasting is based on the Rank-1 team's solution.

It is assumed, however, that at least the following folder structure exists:

.
├── configs
│   ├── ..
├── data
│   ├── outliers
│   │   ├── ...
│   ├── preprocessed
│       ├── ...
...

Training pipeline

Each yaml file inside configs/ holds the configuration of different discord detection algorithms. Thus, in order to execute a strip-down version of the Rank-1 team's solution the following line needs to be executed:

./rank1-solution-simplified.sh configs/{your_config}.yaml

Results

Dictionaries with the computed results can be found in results/. Our model achieves the following forecasting performance (RMSLE) and computation time (min) on the GEPIII dataset, the results of the original competition winning team, a simple statistical approach, a commonly used deep learning approch, and the original ALDI are shown too:

Discords labeled by RMSLE Computation time (min)
Kaggle winning team 2.841 480
2-Standard deviation 2.835 1
ALDI 2.834 40
VAE 2.829 32
ALDI++ 2.665 8