PageRank-HITS: A Jupyter Notebook repository from vibhurai

Info:

-> Dataset me links.txt name ki file h, thats our data.

-> /src has diff files out of which Graph.py is responsible for converting the raw data to a graph. Skim through this file once to see all the functions that the program uses over the course of its execution.

-> HTIS is working perfectly, Simrank isn't necessary ig, Pagerank tum log samajhkr ek baar crosscheck krlna if the guy's code is in accordance with the algorithm or not.

-> The /results would have a filder for each datasert which would further have texts files with the pageranking, auth and hub scores for the dataset.

-> /img is unnecessary. main.py is where the program is mainly executed from.

-> As of now, the things left to do are....

> Understanding pageranking working
> Implementing and making query processing for link analysis algos and incorporating the same
> Implementing VSM
> Std metrics comparison bw link analysis algos and VSM

https://amitness.com/2020/08/information-retrieval-evaluation/

🎏 Python implementation of famous link analysis algorithms.

I have written three articles to explain the concept and implementation of these three algorithms. Feel free to click on the link!

💻 Getting Started

Get a copy of this repo using git clone

git clone https://github.com/chonyy/PageRank-HITS-SimRank.git

Run the program with dataset provided and default values for damping_factor = 0.15, decay_factor = 0.9 and iteration = 100

python main.py -f 'dataset/graph_1.txt'

Run program with dataset and cusotm parameters

python main.py --input_file 'dataset/graph_1.txt' --damping_factor 0.15 --decay_factor 0.9 --iteration 500

Result

Output on console

The order of the result follows the order of the node value. For example, 1, 2, 5, 7.

Output to files in result folder

A new folder with the name same with the graph will be created in the result directory. Four txt files will be created to store the value printed on the console.

graph_HITS_authority.txt
graph_HITS_hub.txt
graph_PageRank.txt
graph_SimRank.txt

vibhurai/PageRank-HITS

💻 Getting Started

Result

Output on console

Output to files in result folder

Analysis