tommyod/Efficient-Apriori

KeyError while generating rules

Closed this issue · 6 comments

aayux commented

The apriori algorithm throws a KeyError when generating rules from itemsets.

Generating itemsets.
 Counting itemsets of length 1.
  Found 457 candidate itemsets of length 1.
  Found 190 large itemsets of length 1.
 Counting itemsets of length 2.
  Found 17955 candidate itemsets of length 2.
  Found 16611 large itemsets of length 2.
 Counting itemsets of length 3.
  Found 67 candidate itemsets of length 3.
  Found 63 large itemsets of length 3.
 Counting itemsets of length 4.
  Found 0 candidate itemsets of length 4.
Itemset generation terminated.

Generating rules from itemsets.
 Generating rules of size 2.
 Generating rules of size 3.
Traceback (most recent call last):
  File "basket.py", line 26, in <module>
    itemsets, rules = apriori(transactions, min_support=.6,  min_confidence=.5, verbosity=1)
  File "/home/.../lib/python3.5/site-packages/efficient_apriori/apriori.py", line 56, in apriori
    return itemsets, list(rules)
  File "/home/.../lib/python3.5/site-packages/efficient_apriori/rules.py", line 329, in generate_rules_apriori
    conf = count(itemset) / count(lhs)
  File "/home/.../lib/python3.5/site-packages/efficient_apriori/rules.py", line 303, in count
    return itemsets[len(itemset)][itemset]
KeyError: ('Item 3', 'Item 5')

Is this a Python version issue?

efficient_apriori not tested on Python 3.5, so the version might indeed be the issue.

  • Can you try on Python 3.6 or 3.7?

If that does not work, I will need an example input on which the program fails. Then I can debug it. But try on Python 3.6 or 3.7 first and see if that works.

aayux commented

It is indeed related to the Python version. Works on versions 3.6 and 3.7.

is it possible to enable python 3.5?

Hi @dwy904 . Unfortunately the program uses features which are not Python 3.5 compatible, so it would need to be backported. This both requires some work, and would make the code base less nice.

  • I suggest you upgrade your Python installation if possible.
  • If you cannot upgrade due to other packages, consider using conda to create a separate environment for using Efficient-Apriori.

@dwy904 It should not take that long to get some decent results, remember to:

  • Get the data into memory, i.e. do not use the data_generator solution.
  • Start with a large value for min_support, let the algorithm converge, then decrease it.
  • Use verbose to see output.