/job-advert-analysis

Analysing data from Kaggle Job Salary Prediction competition

Primary LanguageJupyter NotebookMIT LicenseMIT

Enriching and processing data from the Job Salary Prediction Kaggle Competiton.

This repository looks at methods of aggregating information from the job ad data. See the related articles for more information about the techniques used.

Setup

Requires Python 3.6+. Install requirements.txt in an appropriate virtual environment:

# Set up a new virtual environment
python -m venv .venv
# Install requirement
python -m pip install -r requirements.txt
# Download SpaCy model
python -m spacy download en_core_web_lg

For downloading the Kaggle data you will need Kaggle API credentials set up, and accept the competition rules. Alternatively you can manually download and unzip the data from Kaggle directly.

Runnning

You can run the whole pipeline in the src folder by running ./run.sh, or run each of the numbered steps independently.

!! Placeholder running on port 3000