Resume-vacancies-matching: A repository from jojiku

ITMO Practical Deep Learning Course

Resume&Vacancies Matching Service

Table of Contents

Updates
preprocessing
About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Works Cited
Acknowledgments

Updates

17.12.2023:

The final project consists of Streamlit UI, FastAPI backend with PostgreSQL and Faiss indexes, and DistilUsev1 (trained with ContrastiveCE) on separate embedder module. The embedder module riches 25 RPS in peak on 13th Gen Intel(R) Core(TM) i9-13900HX CPU.

15.12.2023:

We tried four raw (not trained) popular sentence transformers:

DistilUsev1
DistilUsev2
mpnet
MiniLM Concluded that DistilUsev1, even though it was not trained on our data, had the same quality as Doc2Vec. DistilUsev1 was chosen as a base model.

Also service API is now available for searching corresponding vacancies and resumes, using Faiss for storing essintal embeddings and PostgreSQL for other info.

13.12.2023:

Experiments with Doc2Vec were made: Doc2Vec - v1 (vector_size = 35, epochs = 50): positive similarity = 0.414, positive similarity = 0.298, difference = 0.116. Meteor score: 0.342, Rouge score: 0.28.

Doc2Vec - v2 (vector_size = 15, epochs = 50): positive similarity = 0.158, positive similarity = 0.106, difference = 0.0515. Meteor score: 0.126, Rouge score: 0.103.

For more info please visit our notion page.

Preprocessing

We had to work on our data as there weren't any 'ready' datasets for our project. Dataset with vacancies was matched with resume data manually, with 2 different approaches:

By calculating similarities between full texts of resumes and vacancies using Word2Vec, Doc2Vec and TFidVectorization (file resume_matching_data.ipynb). But the results we got here were dissatisfying.
By matching on key words and setting strict filters on data. This approach turned out to be effective.

About The Project

In this project we provide both highly efficient and accurate service for matching CVs with available vacancies using Distiluse sentence-transformer. We use FastAPI with PostgreSQL and Faiss for storing, adding and searching similar resumes and vacancies, Sentence_Transformers for training and inferencing models and Streamlit for cool and minimalistic frontend.

(back to top)

Built With

(back to top)

Installation

Clone the repo

git clone -b randv_main https://github.com/pavviaz/itmo_pdl.git

Place SentenceTransformer checkpoint folder into embedder/weights directory, and example resume and vacancies CSVs into api/init_data (our weights and data, ce_model.zip is model folder, resume_train_no_index.csv and vac_train_no_index.csv are for resume and vacancies data respectively). Change path for model and data in config files if needed

Create .env file in root directory with following keys

DB_NAME=<EXAMPLE_DB_NAME>
DB_USER=<EXAMPLE_DB_USER>
DB_PASSWORD=<EXAMPLE_DB_PASSWD>
DB_HOST=<EXAMPLE_DB_HOST>
DB_PORT=5044

EMBEDDER_URL=http://embedder:5043

Build & run containers

sudo docker-compose build
sudo docker-compose up

Congratulations! Streamlit is now available at http://localhost:8501/ and API endpoints are at http://localhost:5041/docs.

(back to top)

Works Cited

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

@inproceedings{reimers-2019-sentence-bert,
 title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
 author = "Reimers, Nils and Gurevych, Iryna",
 booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
 month = "11",
 year = "2019",
 publisher = "Association for Computational Linguistics",
 url = "https://arxiv.org/abs/1908.10084",
 }

Authors

Fyodorova Inessa

Kudryashov Georgy

Vyaznikov Pavel

(back to top)

jojiku/Resume-vacancies-matching

ITMO Practical Deep Learning Course

Updates

17.12.2023:

15.12.2023:

13.12.2023:

Preprocessing

About The Project

Built With

Installation

Works Cited

Authors