/Text_Rank

Primary LanguagePython

#Author: Mobashir Sadat
#msadat3
#668038298



The functinality of the functions are self-explanatory from their names:

Document.py: Class to represent every document
1. Preprocess: preprocesses, removes unnecessary words, POS tags, creates the vocabulary
2. word_to_idx: returns the index of a word in the vocabulary
3. Create_word_graph: creating the adjacency matrix
4. get_adjacent_indices: return the indices adjacent to a word in the adjacency matrix
5. update_scores: one ieteration of page rank
6. get_page_rank_scores: gets page rank scores for every word in vocab by running until convergence to specified number of iterations(10)
7. get_phrases: creates unigram, bigram, trigram phrases
8. get_top_k_phrases: calls get_page_rank_scores to get all word level scores, calculates the scores of phrases by summing and returns top k phrases
9. get_gold_labels: reads gold label files and preprcocess gold labels
10. get_reciprocal_rank: returns reciprocal rank of one document

Main.py: 
1. getStopWordsList: get the stop words from a file
2. calculate_MRR: gets MRR by extracting keyphrases from all documents given the directory locations of abstracts and gold labels, value of k, alpha, highest number of iterations and window size




###############################HOW To RUM####################################
1. Run a command similar to:

python Main.py "D:/CS 582/HW4/Page_Rank_Words/stopwords.txt" "D:/CS 582/HW4/www/abstracts" "D:/CS 582/HW4/www/gold/" 6 

After python, first argument: python file location, second argument: stopwords file location, third argument: the folder where the abstracts are, fourth argument: the folder where the gold labels are, fifth argument: the value for window.



##############################Results#####################################

Window = 6 K = 1 MRR = 0.07067669172932331
Window = 6 K = 2 MRR = 0.10676691729323308
Window = 6 K = 3 MRR = 0.13834586466165433
Window = 6 K = 4 MRR = 0.1593984962406018
Window = 6 K = 5 MRR = 0.1730827067669172
Window = 6 K = 6 MRR = 0.18360902255639067
Window = 6 K = 7 MRR = 0.19026852846401676
Window = 6 K = 8 MRR = 0.19628356605800173
Window = 6 K = 9 MRR = 0.1999594223654371
Window = 6 K = 10 MRR = 0.203117317102279