/geometric-smote

Implementation of the Geometric SMOTE over-sampling algorithm.

Primary LanguagePythonMIT LicenseMIT

Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

The project has been moved to imbalanced-learn-extra.

geometric-smote

ci doc

Category Tools
Development black ruff mypy docformatter
Package version pythonversion downloads
Documentation mkdocs
Communication gitter discussions

Introduction

The package geometric-smote implements the Geometric SMOTE algorithm, a geometrically enhanced drop-in replacement for SMOTE. It is compatible with scikit-learn and imbalanced-learn. The Geometric SMOTE algorithm can handle numerical as well as categorical features.

Installation

For user installation, geometric-smote is currently available on the PyPi's repository, and you can install it via pip:

pip install geometric-smote

Development installation requires cloning the repository and then using PDM to install the project as well as the main and development dependencies:

git clone https://github.com/georgedouzas/geometric-smote.git
cd geometric-smote
pdm install

Usage

All the classes included in geometric-smote follow the imbalanced-learn API using the functionality of the base oversampler. Using scikit-learn convention, the data are represented as follows:

  • Input data X: 2D array-like or sparse matrices.
  • Targets y: 1D array-like.

The clustering-based oversamplers implement a fit method to learn from X and y:

gsmote_oversampler.fit(X, y)

They also implement a fit_resample method to resample X and y:

X_resampled, y_resampled = gsmote.fit_resample(X, y)

Citing geometric-smote

If you use geometric-smote in a scientific publication, we would appreciate citations to the following paper:

Publications using Geometric-SMOTE:

  • Fonseca, J., Douzas, G., Bacao, F. (2021). Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification. Remote Sensing, 13(13), 2619. https://doi.org/10.3390/rs13132619

  • Douzas, G., Bacao, F., Fonseca, J., Khudinyan, M. (2019). Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sensing, 11(24), 3040. https://doi.org/10.3390/rs11243040