Get recommendations of similar jobs when job hunting.
The system uses unsupervised learning to generate the recommendations. More specifically, the pipeline includes:
- Extensive data preprocessing pipeline (HTML/URL/stopwords removal, lowercasing, stemming)
- Integer Encoding
- TF-IDF text vectorization
- SVD dimensionality reduction
- Cosine similarity metric
- Before starting, ideally, it's recommended to switch to a virtual environment first via
conda
, using Python 3.8. - Install dependencies in your virtual environment via
pip install -r requirements.txt
- To train or inference the model, run
python run.py --dataset_path <DATASET_PATH> --mode <MODE>
. The<MODE>
parameter should be betweentrain
,dev
,test
. - To run the demo, run
streamlit run demo.py
.