Matchif-AI is a web application designed to assist users in finding job descriptions that closely match their resumes. Users can upload their resume in PDF format, which is processed and embedded into vector representations. These resume vectors are compared against a pre-existing vector database of job descriptions to find the best matches.
Note: This is a demo / proof-of-concept; the job descriptions are out-of-date. So, if you apply and don’t hear back, it’s not you—it’s them ;P
If you prefer not to run the application locally, you can access the deployed version on Google Cloud Run: Matchif-AI - Live Demo The infrastructure for this deployment is managed using Infrastructure as Code (IaC) with Pulumi.
Initial exploratory data analysis of the job listing dataset can be found in backend/notebook
, which provides insight into the structure and contents of the data. The dataset can be found here: LinkedIn Job Postings Dataset.
- Preprocess the job listing data to:
- Remove unnecessary columns.
- Handle missing data via removal or imputation.
- Remove duplicates.
- Remove HTML tags in descriptions.
- Normalise spacing and newlines in descriptions.
- Create a
JobPosting
collection on Weaviate to store embedding vectors of the job descriptions. - Vectorise the data and store it in the
JobPosting
collection using Weaviate. These will be used for generative search later.
Once the Weaviate JobPosting
collection is populated, the application is ready to be served.
Run the Application: Follow the setup instructions provided in the "Local Setup" section to build and run the application using Docker Compose.
- Resume Processing:
- When a user uploads their resume, the text is extracted.
- This text is then vectorised and added to a Weaviate collection
UserProfile
.
- Finding Job Matches:
- Using the UUID of the user's resume, the corresponding vector is extracted from Weaviate.
- Generative Search using Weaviate is then performed on the
JobPosting
collection with the user's resume vector to find job postings that are similar to the user's resume vector. - The application retrieves these similar job postings.
- Summarising Job Descriptions:
- For each retrieved job posting, Weaviate's generative search summarises a brief overview of the company and the role, followed by main responsibilities, required skills, and qualifications.
- The summarised job descriptions are presented to the user for easier readability.
Follow these steps to set up the Matchif-AI application locally.
- Docker and Docker Compose installed
- Download the LinkedIn Job Postings dataset and place the
postings.csv
inside thedata
directory in thebackend
directory. LinkedIn Job Postings Dataset
Create a .env
file in the backend
directory and add the following environment variables:
WCS_URL=<your-weaviate-instance-url>
WCS_API_KEY=<your-weaviate-api-key>
OPENAI_APIKEY=<your-openai-api-key>
Replace <your-weaviate-instance-url>
, <your-weaviate-api-key>
, and <your-openai-api-key>
with your actual Weaviate and OpenAI API details.
Prerequisites: Ensure that poetry
is installed. You can install it using the following command:
pip install poetry
Navigate to the backend
directory and run the following command to preprocess the job listing dataset. This step removes duplicates, handles missing values by imputing them, removes HTML tags, normalises spacing, and replaces new lines with full stops for better tokenisation:
cd backend
poetry install
poetry run python preprocess.py
This will preprocess the data and populate the Weaviate collections.
Navigate back to the root directory of the project and run the following command to build and start the Docker containers for both the frontend and backend:
docker-compose up --build
Once the Docker container is up and running, you can access the web application on http://localhost:3000
.
- Integrate live scraping of job descriptions instead of using a static dataset.
- Enable filtering options for location, experience type, etc.
- Implement a more complex vector search that can take into account users' preferences in skills, industries, etc.
- Enhance the user interface to provide a more interactive and intuitive experience.