The Repository consists of files required to extracting words and filtering frequent,essential words out of them from PDFs extracted from linkedin.
To Use this repository, follow the below steps.
******************** EXECUTION ***********************
-
The profiles of 50 random people were manually downloaded and present in "Linkedin_Profiles" folder.
-
Execution of "text_extraction.py" generates "output.txt","output1.csv"
output.txt : Text file containing text extracted from one PDF profile. output1.csv : CSV file containing the data of all 50 profiles under "LinkedIn Profiles" label.
- Execution of "frequent_words.py" generates "output2.csv"
output2.csv : CSV file containing the data as well as frequent words in two columns respectively.
4)Execution of "essential_words.py" generates "output3.csv"
output3.csv : CSV file containing the data,frequent words,essential words in three columns respectively.