Generalized Sequence Pattern (GSP) algorithm in Python
This package was created to use GSP with Python
Install Python:
sudo apt install python3
To download GSP-Py just clone the Git repository hosted in GitHub:
git clone https://github.com/jacksonpradolima/gsp-py.git
python setup.py install
Alternatively, you can install it with pip
:
pip install gsppy
Examples of configuring and running are located in the test folders gsppy folder.
To use it in a project, import it and use the GSP class.
from gsppy.gsp import GSP
It is assumed that your transactions are a sequence of sequences representing items in baskets.
transactions = [
['Bread', 'Milk'],
['Bread', 'Diaper', 'Beer', 'Eggs'],
['Milk', 'Diaper', 'Beer', 'Coke'],
['Bread', 'Milk', 'Diaper', 'Beer'],
['Bread', 'Milk', 'Diaper', 'Coke']
]
Init the class to prepare the transactions and to find patterns in baskets that occur over the support threshold (count):
result = GSP(transactions).search(0.3)
The support count (or simply support) for a sequence is defined as the fraction of total data-sequences that "contain" this sequence. (Although the word "contains" is not strictly accurate once we incorporate taxonomies, it captures the spirt of when a data-sequence contributes to the support of a sequential pattern.)
This project is licensed under the terms of the MIT - see the LICENSE file for details.
If this package contributes to a project which leads to a scientific publication, I would appreciate a citation.
@misc{pradolima_gsppy,
author = {Prado Lima, Jackson Antonio do},
title = {{GSP-Py - Generalized Sequence Pattern algorithm in Python}},
month = May,
year = 2020,
doi = {10.5281/zenodo.3333987},
url = {https://doi.org/10.5281/zenodo.3333987}
}