tommyod/Efficient-Apriori

Using iterator or generator to provide transactions

Closed this issue · 2 comments

I use a generator to pass the transactions.
This works just fine if it actually returns any items.
However, if the generator or iterator is "empty", the application will crash:

File "...\.venv\lib\site-packages\efficient_apriori\itemsets.py", line 36, in __init__
    self._transactions = i + 1
UnboundLocalError: local variable 'i' referenced before assignment

A check for empty transactions is already done here but it does not work with generators or iterators:

if not transactions:

My suggestion:
Initialize i in TransactionManager.__init__ as follows

    def __init__(self, transactions: typing.Iterable[typing.Iterable[typing.Hashable]]):

        # A lookup that returns indices of transactions for each item
        self._indices_by_item = collections.defaultdict(set)

        # Populate
        i = -1 # NEW
        for i, transaction in enumerate(transactions):
            for item in transaction:
                self._indices_by_item[item].add(i)

        # Total number of transactions
        self._transactions = i + 1

and add (or move the) check after instantiating the class like this:

def itemsets_from_transactions(...
    ...
    # Store in transaction manager
    manager = TransactionManager(transactions)

    # NEW: moved check below manager init and replaced the check by using no. of transactions
    if len(manager) < 1:
        return dict(), 0  # large_itemsets, num_transactions
    ...

I understand that the expected input is a list, set or tuple but wrt. duck-typing and the (hopefully) little effort required to fix, maybe one can add it ;)

Thanks for considering and thanks as well for providing the library. 👍

Thanks for the well written issue. I think your proposed solution is nice. Maybe add a regression test as well. Do you want to create a PR?

Thanks for the well written issue. I think your proposed solution is nice. Maybe add a regression test as well. Do you want to create a PR?

I recently added my solution and a minimal test and opened a PR.
I also did the regression test locally (used virtualenv to keep dependencies clean).