Info:
-> Dataset me links.txt name ki file h, thats our data.
-> /src has diff files out of which Graph.py is responsible for converting the raw data to a graph. Skim through this file once to see all the functions that the program uses over the course of its execution.
-> HTIS is working perfectly, Simrank isn't necessary ig, Pagerank tum log samajhkr ek baar crosscheck krlna if the guy's code is in accordance with the algorithm or not.
-> The /results would have a filder for each datasert which would further have texts files with the pageranking, auth and hub scores for the dataset.
-> /img is unnecessary. main.py is where the program is mainly executed from.
-> As of now, the things left to do are....
> Understanding pageranking working
> Implementing and making query processing for link analysis algos and incorporating the same
> Implementing VSM
> Std metrics comparison bw link analysis algos and VSM
https://amitness.com/2020/08/information-retrieval-evaluation/
🎏 Python implementation of famous link analysis algorithms.
I have written three articles to explain the concept and implementation of these three algorithms. Feel free to click on the link!
Get a copy of this repo using git clone
git clone https://github.com/chonyy/PageRank-HITS-SimRank.git
Run the program with dataset provided and default values for damping_factor = 0.15, decay_factor = 0.9 and iteration = 100
python main.py -f 'dataset/graph_1.txt'
Run program with dataset and cusotm parameters
python main.py --input_file 'dataset/graph_1.txt' --damping_factor 0.15 --decay_factor 0.9 --iteration 500
The order of the result follows the order of the node value. For example, 1, 2, 5, 7
.
A new folder with the name same with the graph will be created in the result directory. Four txt files will be created to store the value printed on the console.
- graph_HITS_authority.txt
- graph_HITS_hub.txt
- graph_PageRank.txt
- graph_SimRank.txt