data generator sample not working

Question

data generator sample not working

nirvitarka opened this issue 6 years ago · 4 comments

Any help about using the data generator as mentioned in the examples?

Here is what I am trying

`from efficient_apriori import apriori

def data_generator(filename):
"""
Data generator, needs to return a generator to be called several times.
"""
def data_gen():
with open(filename) as file:
for line in file:
yield tuple(k.strip() for k in line.split(','))

return data_gen

transactions = data_generator('data.csv')

itemsets, rules = apriori(transactions, min_support=0.2, min_confidence=1)

rules_rhs = filter(lambda rule: len(rule.lhs) == 2 and len(rule.rhs) == 1, rules)
for rule in sorted(rules_rhs, key=lambda rule: rule.lift):
print(rule) # Prints the rule and its confidence, support, lift, ...
`

And the csv file is downloaded from

https://drive.google.com/file/d/1y5DYn0dGoSbC22xowBq2d4po6h1JxcTQ/view?usp=sharing

Answer 1 · 2019-02-06T15:42:58.000Z

Hello @nirvitarka . Thank you for raising the issue.

You should change the filename. The min_support argument is set a too high, and so is min_confidence. This is data set dependent, and there is a trade-off between speed and output.

The following works for me:

from efficient_apriori import apriori

def data_generator(filename):
    def data_gen():
        with open(filename) as file:
            for line in file:
                yield tuple(k.strip() for k in line.split(','))
    
    return data_gen

transactions = data_generator('store_data.csv')

itemsets, rules = apriori(transactions, 
                          min_support=0.05, 
                          min_confidence=0.01, 
                          verbosity=2)

for rule in sorted(rules, key=lambda rule: rule.lift):
    print(rule)

Please note that the data generator approach is not needed when data fits into memory.

The following approach is better, since it loads data into memory once.

from efficient_apriori import apriori

# Read data into memory from the file
transactions = []
with open('store_data.csv') as file:
    for line in file:
        transactions.append(set(tuple(k.strip() for k in line.split(','))))

# Run the Apriori algorithm, print output with `verbosity`
itemsets, rules = apriori(transactions, 
                          min_support=0.05, 
                          min_confidence=0.01, 
                          verbosity=2)

for rule in sorted(rules, key=lambda rule: rule.lift):
    print(rule)

If hope this helps. Let me know if it works for you.

Answer 2 · 2019-02-06T16:02:24.000Z

Thank you for prompt response.

I changed the filename as I used a subset of that data in "data.csv".
Changing min_support & min_confidence helped, it worked on a smaller subset of data.

However still not working on the original "store_data.csv". It says "Rule generation terminated." and ends.

It is getting some itemsets for length 1 & 2 & no itemsets of length 3.

Am I understanding it right that it should display results for length 1 & 2 itemsets?

Answer 3 · 2019-02-06T17:05:12.000Z

That depends on whether you filter the rules or not, print(rules) will show all rules with no filtering. Depending on the data and the values for min_support and min_confidence, you might or might not have any rules of length 3 at all.

Answer 4 · 2019-02-06T17:09:06.000Z

Got it, thank you for helping