Using iterator or generator to provide transactions
Closed this issue · 2 comments
I use a generator to pass the transactions.
This works just fine if it actually returns any items.
However, if the generator or iterator is "empty", the application will crash:
File "...\.venv\lib\site-packages\efficient_apriori\itemsets.py", line 36, in __init__
self._transactions = i + 1
UnboundLocalError: local variable 'i' referenced before assignment
A check for empty transactions is already done here but it does not work with generators or iterators:
My suggestion:
Initialize i
in TransactionManager.__init__
as follows
def __init__(self, transactions: typing.Iterable[typing.Iterable[typing.Hashable]]):
# A lookup that returns indices of transactions for each item
self._indices_by_item = collections.defaultdict(set)
# Populate
i = -1 # NEW
for i, transaction in enumerate(transactions):
for item in transaction:
self._indices_by_item[item].add(i)
# Total number of transactions
self._transactions = i + 1
and add (or move the) check after instantiating the class like this:
def itemsets_from_transactions(...
...
# Store in transaction manager
manager = TransactionManager(transactions)
# NEW: moved check below manager init and replaced the check by using no. of transactions
if len(manager) < 1:
return dict(), 0 # large_itemsets, num_transactions
...
I understand that the expected input is a list, set or tuple but wrt. duck-typing and the (hopefully) little effort required to fix, maybe one can add it ;)
Thanks for considering and thanks as well for providing the library. 👍
Thanks for the well written issue. I think your proposed solution is nice. Maybe add a regression test as well. Do you want to create a PR?
Thanks for the well written issue. I think your proposed solution is nice. Maybe add a regression test as well. Do you want to create a PR?
I recently added my solution and a minimal test and opened a PR.
I also did the regression test locally (used virtualenv to keep dependencies clean).