Hey y'all!
I originally kicked this off as an open challenge to analyze the datasets included here. (Read the original challenge here)
Since then, I've decided to make the challenge a bit more strucutred. I wrote a blog post on the intuition behind the Apriori Algorithm, where I challenged readers to implement the algorithm themselves.
Here, I'm providing datasets that may be useful as you implement the algorithm. I've also included dataset_builder.py
, which you can use to generate very large datasets and test your implementation "at scale". (You'll be running these on your own machine, most likely - so you won't be doing anything at petabyte scale. But hey - it's still fun.)
The hand-drawn mini dataset from the blog post is blog-baskets.txt
; baskets.txt
and other-baskets.txt
are two other datasets that I encourage you to use.
Personally, I think this kind of thing is more fun when you engage with others. If you'd like to share your implementation with me - or give me any feedback you have on this - please reach out! My contact info is below.
And if you want to engage with others with an interest in Data Science - as well as hear about these projects as I'm creating them, join my mailing list. I send out a weekly email with some Data Science related content each week.
Send your findings to me at: dan [at] isaza [dot] dev