This our NLP course final project. It was a classification for ACM articles
Link for original dataset used: http://www.psantos.com.pt/files/trabalhos-academicos/2007-2008-tmei/?fbclid=IwAR3Nen9EuF937td55XEYRAeRL7bb8x__KGs-E3RwC63vh1r_z3-KTSJFgCg
Instrucions for running the code:
-
To see the results of Jaccard distance: you should modify the paths of the used data files in the code according to your paths.
-
To see the result of the various classifiers with thier various features set including the word2ves as feature set:
-
downlownd the word2vec pretrained model from this link: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit
-
modify the paths of the used data files according to your environment.
-
uncomment the following lines to see the results using features set other than word2vec: 82, 97, 108
-
run the code and enter a PDF link for an ACM doeument and the program will show the results of classification
Notice: all the necessary data files exist in dataset folder