9941 Project (Guandong Kou)

This project is used for the auto-classification of job categories, job types and experience levels for the raw HTML date collected from job board websites. The Google Colab for training the model can be found here

Currently, it only supports 5 most frequent job categories and 10 most frequence job types. (The rest need more data for the models to learn well.)

Main work in this repo:

  • tokenization of raw HTML using beautifulsoup, NLTK
  • feature engineering with TF-IDF approach
  • model training with logistic regression, support vector classifier, and neural network
  • model performance evaluations
  • model serialization, API and front-end webpage development


Python 3.8.5

Install Packages

pip install --user -r requirements.txt
python setup.py

Run Server

python server.py

Then, open localhost:5000 for API test.


manual_rules: the configuration files for manually added matching rules models: serialized trained models for predictions static: static resources templates: front-end templates (used for API test etc.) train: used for training models in Google Colab utils: utility functions used for predictions and model training