The Repository consists of files required to extracting words and filtering frequent,essential words out of them from PDFs extracted from linkedin.

To Use this repository, follow the below steps.

******************** EXECUTION ***********************

  1. The profiles of 50 random people were manually downloaded and present in "Linkedin_Profiles" folder.

  2. Execution of "text_extraction.py" generates "output.txt","output1.csv"

output.txt : Text file containing text extracted from one PDF profile. output1.csv : CSV file containing the data of all 50 profiles under "LinkedIn Profiles" label.

  1. Execution of "frequent_words.py" generates "output2.csv"

output2.csv : CSV file containing the data as well as frequent words in two columns respectively.

4)Execution of "essential_words.py" generates "output3.csv"

output3.csv : CSV file containing the data,frequent words,essential words in three columns respectively.