/MNDO

Multivariate Normal Distribution based Oversampling

Primary LanguageJupyter NotebookMIT LicenseMIT

MNDO

Python implementation of MNDO (Multivariate Normal Distribution based Oversampling).

Article about this implemention

Requirements

  • Anaconda / Python 3.6
  • tqdm 4.31.1
  • imbalanced-learn 0.4.3

Usage

Preprocessing Keel-datasets

If you use Keel-datasets, you can use the following command.

python pre_dataset.py dataset_directory
  • Preprocessing all files in a directory.
  • Remove unnecessary lines and replace class labels. (Positive class -> 1, Negative class -> 0)
  • Preprocessed data is saved in MNDO/Predataset/xxx.csv

Over-sampling

Resampled(generated) data is stored in ./pos_data

python over-sampling.py data_path

Training

python train.py data_path

train.py steps:

  1. Load data
  2. Over-sampling (MNDO, SMOTE, Borderline-SMOTE, ADASYN, SMOTE-ENN and SMOTE-Tomek Links)
  3. Scaling (Normalization or Standardization)
  4. Learning (SVM, Decision Tree and k-NN)
  5. Predict (Results is saved in MNDO/output/xxx.csv)

If you want to train all files, you can use this script:

./run.sh

ToDo

  • Provide as python library

Related works

Author

Kotaro Ambai (baibai25)