/Apriori-Algorithm-Frequent-Item-in-Grocery-Store

This assignment will give you basic insight into using Apriori algorithm. Apriori is use for finding the frequent item set in transaction.

Primary LanguageC++MIT LicenseMIT

Apriori Algorithm Frequent Item in Grocery Store

This was our Data Structures assignment in which we had to implement the Apriori Algorithm to find out the frequency of each item in a grocery store.

Assignment #01

Finding the Frequent itemset in Grocery Store

Assignment Description:

This assignment will give you basic insight into using Apriori algorithm. Apriori is use for finding the frequent item set in transaction. For example, in Supermarket store where customers can buy different categories of items. There is always a pattern for what a customer buy. This pattern changes according to the buyer. For example, if a buyer is a Player, and he buys products such as Bat, Ball then, its most probable that he will purchase a tape too. But if the buyer is a mother having babies, she will buy baby products such as milk and diapers. In short, every buyer’s transaction involves a pattern. Our goal is to find those patterns in these transactions. Profit is automatically generated if the relationship is found between the items purchased in different transactions. Apriori algorithm was the first algorithm that was proposed for frequent itemset mining. Let’s understand Apriori algorithm using an example step by step:

You are given the grocery store dataset. Text file contains:

  1. The first row of the file contains Support threshold e.g. 0.5.
  2. The second row contains total number of transactions.
  3. Followed by transactions. Each transaction contains different set of items separated by comma. E.g. Bread, Cheese, Egg, Juice etc

Example:

0.5 6 Bread,Cheese,Egg,Juice Bread,Chesse,Juice Bread,Milk,Yougrt Bread,Juice,Milk Cheese,Juice,Milk Bread,Egg,Cheese,Juice

Support threshold is 0.5 and there are total no of 6 transaction and the first transaction is Bread,Cheese,Egg,Juice in the above example.

Support threshold determines, is the item popular or not.

Step 01:

Read the file and store all the transactions in 2D char Array. And evaluate Support threshold=0.5*No of transaction. So, you get threshold 0.5 * 6 = 3. Note: File should be read once.

Step 02:

Now you have to find the frequency of all the items using 2D char Array.

Step 03:

The set of 1st – itemset whose occurrence is satisfying the Support threshold are determined. Only those items which count more than or equal to Support threshold are taken ahead for the next iteration and the others are pruned (deleted). E.g. Egg and Yogurt have frequencies less than 3. Remove Items that does not meet the minimum support threshold and sort them. The output should look like this.

Step 04:

Next, 2-item pair candidate frequencies are discovered. The 2-item pairs are generated by forming a group of 2 items using 1st – ItemSet (Table 02) and 2D char Array.

Step 05:

The 2-itemset candidates are pruned (deleted) using Support threshold value and then sort them. Now the table will have 2 –item sets with Support threshold.

Step 06:

The 3-item pair candidates with frequencies are generated by grouping pair of 3 items using 1st – ItemSet (Table 02) and 2D char Array.

Step 07:

The 3-itemset candidates are pruned (deleted) using Support threshold value.

You are only required to generate 1st , 2nd & 3rd ItemSets. Output files & Marks distribution:

  1. Generate 1st – ItemSet (as showed in Table 02) and write on 1-ItemSet.txt (Marks 60)
  2. Generate 2nd – ItemSet (as showed in Table 04) and write on 2-ItemSet.txt (Marks 20)
  3. Generate 3rd – ItemSet (as showed in Table 06) and write on 3-ItemSet.txt (Marks 20)

Note: Path of the file must be same as the Project path.