/a-priori

Implementation of A-Priori algorithm in Pharo

Primary LanguageSmalltalkMIT LicenseMIT

APriori

Build status Coverage Status Pharo version Pharo version Pharo version Pharo version License

Implementation of A-Priori algorithm in Pharo

Installation

To install APriori, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):

Metacello new
  repository: 'github://pharo-ai/a-priori:v1.0.0';
  baseline: 'AIAPriori';
  load.

How to use it?

Create a list of transactions. Each transaction is an itemset (for example, a list of products that were purchased together by one customer).

transactions := #( 
  (eggs milk butter)
  (milk cereal)
  (eggs bacon)
  (bread butter)
  (bread bacon eggs)
  (bread avocado butter bananas)).
  
transactionsSource := APrioriTransactionsArray from: transactions.

Initialize an APriori algorithm with transactions:

apriori := APriori forTransactions: transactionsSource.

Specify the support threshold:

apriori minSupport: 1/3.

Alternatively, you could specify a minimum count threshold:

apriori minCount: 2.

Now you can find frequent itemsets - sets of items that are likely to be purchased together:

apriori findFrequentItemsets.

The result will be stored in the frequentItemsets instance variable of apriori:

apriori frequentItemsets.

anArray(
  {bread, butter}
  {eggs, bacon}
)

You can generate association rules from those frequent itemsets in the form key => value where a set of items value will be recommended to a customer who purchases a set of items key:

apriori buildAssociationRules.

The result will be stored in the associationRules instance variable of apriori:

apriori associationRules.

" anArray(
  {bread} => {butter}
  {butter} => {bread}
  {eggs} => {bacon}
  {bacon} => {eggs}
)"

Every itemset knows its count and support:

itemset := itemsets first. "{bread, butter}"
itemset count. "2"
itemset support. "1/3"

Similarly, every rule knows its count, support, confidence, and lift:

rule := rules first. "{bread} => {butter}"
rule count. "2"
rule support. "1/3"
rule confidence. "2/3"
rule lift. "4/3"

Real-world example

Download the Groceries dataset. It contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The dataset contains 9835 transactions and the items are aggregated to 169 categories. We load the contents of groceries.csv into Pharo, split it by lines and then by commas to get a collection of transactions:

file := '/path/to/groceries.csv' asFileReference.
lines := Character lf split: file contents.
groceries := lines collect: [ :line | $, split: line ].

Now we initialize an A-Priori algorithm with support threshold of 1% and confidence threshold of 50%:

apriori := APriori
  transactions: groceries
  supportThreshold: 0.01
  confidenceThreshold: 0.5.

We generate association rules (this can take about a minute):

rules := apriori associationRules.

And sort them by lift in descending order:

rules sort: [ :a :b | a lift > b lift ].

This will produce the following 15 rules:

{citrus fruit, root vegetables} => {other vegetables}
{root vegetables, tropical fruit} => {other vegetables}
{rolls/buns, root vegetables} => {other vegetables}
{root vegetables, yogurt} => {other vegetables}
{curd, yogurt} => {whole milk}
{butter, other vegetables} => {whole milk}
{root vegetables, tropical fruit} => {whole milk}
{root vegetables, yogurt} => {whole milk}
{domestic eggs, other vegetables} => {whole milk}
{whipped/sour cream, yogurt} => {whole milk}
{rolls/buns, root vegetables} => {whole milk}
{other vegetables, pip fruit} => {whole milk}
{tropical fruit, yogurt} => {whole milk}
{other vegetables, yogurt} => {whole milk}
{other vegetables, whipped/sour cream} => {whole milk}

Which means that we will recommend other vegetables to the customers who purchased citrus fruit and root vegetables. And for those who took curd and yogurt we will recommend the whole milk.