Daniel Moon |
Joseph Kuang |
Aditya Jain |
Amit Sawhney |
Background and Leading Question • Repository Breakdown • Technologies • Running The Project • Test Suite
In our project, we are hoping to find some pattern between buying habits and how Amazon presents items that are frequently bought together. Specifically, we would like to explore the extent of importance of certain items, to see which items Amazon recommends together and if we can find any vertices that act as "hubs" for many items, meaning that it is more popular and bought with more other items. The final deliverable should rank these items by the number of items that are recommended with these "hub" nodes which reveals how likely it is to get from one place to another just through frequently bought together items.
The difference in the output we get from Betweenness Centrality and PageRank is that PageRank shows where all the edges lead to, while the Betweenness Centrality shows where all the shortests paths will go through as opposed to where they will end up.
Full Proposal
Presentation Video
Due to the size the data, please download the Amazon dataset and follow the instructions below.
- Download the dataset.
- Extract the data into a file called
data.txt
. - Drop the data into the
data/
directory.
C++ |
Makefile |
Catch2 |
- Please make sure you have configured the data as mentioned above if you would like to run main code.
- Ensure that you are in the root directory.
make all
make
make
./main data_path number_of_nodes name_output algorithm
- i.e.
./main data/input.txt 1200 output.txt dfs
- i.e.
dfs
- runs dfs algorithmpagerank
- runs pagerank algorithmbetweenness
- runs betweenness centrality algorithmall
- runs all algorithms
make clean
make test
make test [parse]
parse
dfs
pagerank
betwenness
Ensured reliable graph parsing from input files by testing the following conditions:
- Test connected directed graph
- Test connected undirected graph
- Test unconnected directed graph
- Test connected directed graph
- Edge cases
- Test extraneous info at top of file is not read
Ensured correct DFS traversals by testing the following conditions:
- Test connected directed graph
- Test connected undirected graph
- Test unconnected directed graph
- Test unconnected undirected graph
- Validated Google Page Rank matrix creation by calculating the values of the matrix
- Validated that the the values stored by the Sparse Matrix are accurate
- Validated Matrix-vector multiplication by calculating the values of the product
Validated the values of Matrix-sparce vector calculations by testing the following conditions:
- Test Sparce Product with zero as sprace values
- Test Sparce Product with non-zero sparce values
- Validated the norm of a vector by calculating the values of the normal vector
Ensured accurate page rank calculations by testing the following conditions
- Test sparce connected graph
- Test sparce unconnected graph
- Test connected directed graph
- Test unconnected directed graph
All tests
Parse Nodes - Connected Directed Graph
Parse Nodes - Don't Read Complete File
Parse Nodes - Connected Undirected Graph
Parse Nodes - Multiple Components Directed Graph
Parse Nodes - Multiple Components Undirected Graph
DFS Traversal - Connected Directed Graph
DFS Traversals - Connected Undirected Graph
DFS Traversal - Multiple Components Directed Graph
DFS Traversal - Multiple Components Undirected Graph
Create Google Page Rank Matrix
Matrix Vector Multiplication
Sparse Matrix Vector Multiplication
Sparse Matrix Vector Multiplication with NonZero Sparse Values
2-Norm of Vector
Page Rank - Connected Graph
Page Rank - Multiple Components Graph
Betweenness Centrality - Directed One Component
Betweenness Centrality - Directed Multiple Components Graph