/amazon-product-graph-analysis

Using graph algorithms and traversals to analyze Amazon's product recommendation engine.

Primary LanguageC++

CS 225 Final Project

Daniel Moon
Daniel Moon

Joseph Kuang
Joseph Kuang

Aditya Jain
Aditya Jain

Amit Sawhney
Amit Sawhney

Background and Leading QuestionRepository BreakdownTechnologiesRunning The ProjectTest Suite

Background and Leading Question

In our project, we are hoping to find some pattern between buying habits and how Amazon presents items that are frequently bought together. Specifically, we would like to explore the extent of importance of certain items, to see which items Amazon recommends together and if we can find any vertices that act as "hubs" for many items, meaning that it is more popular and bought with more other items. The final deliverable should rank these items by the number of items that are recommended with these "hub" nodes which reveals how likely it is to get from one place to another just through frequently bought together items.

The difference in the output we get from Betweenness Centrality and PageRank is that PageRank shows where all the edges lead to, while the Betweenness Centrality shows where all the shortests paths will go through as opposed to where they will end up.

Full Proposal
Presentation Video

Repository Breakdown

Configuring Data

Due to the size the data, please download the Amazon dataset and follow the instructions below.

  1. Download the dataset.
  2. Extract the data into a file called data.txt.
  3. Drop the data into the data/ directory.

Technologies

C++
C++
Makefile
Makefile
Catch2
Catch2

Running the Project

Prerequisites

  • Please make sure you have configured the data as mentioned above if you would like to run main code.
  • Ensure that you are in the root directory.

Main

make all
make

Choosing Input File, Output Location, and algorithm

  • make
  • ./main data_path number_of_nodes name_output algorithm
    • i.e. ./main data/input.txt 1200 output.txt dfs

List of all possible algorithms to run

  • dfs - runs dfs algorithm
  • pagerank - runs pagerank algorithm
  • betweenness - runs betweenness centrality algorithm
  • all - runs all algorithms

Cleaning Directory

make clean


Test Suite

Runnings Tests

make test

Argument Example

make test [parse]

Available Arguments

  • parse
  • dfs
  • pagerank
  • betwenness

Graph Parsing

Ensured reliable graph parsing from input files by testing the following conditions:

  • Test connected directed graph
  • Test connected undirected graph
  • Test unconnected directed graph
  • Test connected directed graph
  • Edge cases
    • Test extraneous info at top of file is not read

DFS Traversal

Ensured correct DFS traversals by testing the following conditions:

  • Test connected directed graph
  • Test connected undirected graph
  • Test unconnected directed graph
  • Test unconnected undirected graph

Page Rank Matrix

  • Validated Google Page Rank matrix creation by calculating the values of the matrix
  • Validated that the the values stored by the Sparse Matrix are accurate

Matrix-Vector Multiplication

  • Validated Matrix-vector multiplication by calculating the values of the product

Matrix-Sparce Vector Multiplication

Validated the values of Matrix-sparce vector calculations by testing the following conditions:

  • Test Sparce Product with zero as sprace values
  • Test Sparce Product with non-zero sparce values

Vector Normalization

  • Validated the norm of a vector by calculating the values of the normal vector

Page Rank

Ensured accurate page rank calculations by testing the following conditions

  • Test sparce connected graph
  • Test sparce unconnected graph

Betweenness Centrality

  • Test connected directed graph
  • Test unconnected directed graph
All tests

Parse Nodes - Connected Directed Graph
Parse Nodes - Don't Read Complete File
Parse Nodes - Connected Undirected Graph
Parse Nodes - Multiple Components Directed Graph
Parse Nodes - Multiple Components Undirected Graph
DFS Traversal - Connected Directed Graph
DFS Traversals - Connected Undirected Graph
DFS Traversal - Multiple Components Directed Graph
DFS Traversal - Multiple Components Undirected Graph
Create Google Page Rank Matrix
Matrix Vector Multiplication
Sparse Matrix Vector Multiplication
Sparse Matrix Vector Multiplication with NonZero Sparse Values
2-Norm of Vector
Page Rank - Connected Graph
Page Rank - Multiple Components Graph
Betweenness Centrality - Directed One Component
Betweenness Centrality - Directed Multiple Components Graph