/Vector-Representation-of-Indeed-Job-Listings-NLP

Used spaCy tokenizer to process the text and BeautifulSoap to remove HTML tags from the job descriptions. Built tokenizer and used CountVectorizer to get the word counts for each listing. Created dtm and tf-idf feature matrix. Built search engine to query the job listings and find documents that are similar to the desired job listings.

Primary LanguageJupyter Notebook

Watchers