LoRAS

Localized Random Affine Shadowsampling

This repo provides a python implementation of an imbalanced dataset oversampling technique known as Localized Random Affine Shadowsampling (LoRAS). This implementation piggybacks off the package imbalanced-learn and thus aims to be as compatible as possible with it.

Dependencies

Python >= 3.6
numpy >= 1.17.0
imbalanced-learn

Installation

Using pip:

$ pip install -U pyloras

Installing from source requires an installation of poetry and the following shell commands:

$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ poetry install
# add package to python's path
$ export PYTHONPATH=$PWD:$PYTHONPATH

Usage

from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
                           n_redundant=0, n_repeated=0, n_classes=3,
                           n_clusters_per_class=1,
                           weights=[0.01, 0.05, 0.94],
                           class_sep=0.8, random_state=0)

lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]

# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)

Visualization

Below is a comparision of imbalanced-learn's SMOTE implementation with LORAS on the dummy data used in this doc page using the default parameters.

The plots can be reproduced by running:

$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>

References