Introduction

Our main project idea was to predict the salary of job postings available on LinkedIn given certain features and descriptions available to us in those job descriptions. We chose to focus on studying and predicting job position salaries for three primary reasons. Firstly, a lack of salary transparency can place candidates at a significant disadvantage, particularly when considering multiple job offers with varying timelines. In many places, companies are not legally required to report salary ranges for positions, making it difficult for candidates to make informed decisions about which offers to accept.

Secondly, we want to provide applicants with a fair evaluation of what they should be paid, using recent legislation as a guide. Consider California’s new Bill for example, (“California Releases Guidance on Pay Transparency Law”, 2022), which requires companies to post a salary range with each job posting. While companies in some regions must now publish pay ranges, these ranges can be so large that they do not provide much insight into what a particular candidate can expect to earn. Our project aims to provide candidates with a more accurate estimate of their expected pay, using machine learning and natural language processing techniques to analyze job postings and position details.

Lastly, we believe that transparency around compensation is crucial for building trust between employers and employees, and creating a positive work environment. When employees have a good understanding of how their pay is determined, they are more likely to feel appreciated and respected by their employer. Additionally, pay transparency can help to identify and address discrimination and bias in the workplace. By providing candidates with more information about the salary range for a particular job, we hope to contribute to a more equitable and inclusive job market.

Data description

Dataset (jobs.csv):

We will be utilizing a free publicly available data set on LinkedIn job postings. This data set has features which we can use as variables and derive insights from, in order to perform supervised learning and provide a salary prediction for these postings (Datase available in the repository and linked in the code as well).

The dataset is a public github repository in which a user scraped a compilation of job posts and metadata from various tech categories on LinkedIn.

Algorithms Used

  1. Random Forest Regression
  2. Neural Networks
  3. Decision Trees
  4. GAMs - Generalized Additive Models
  5. K Nearest Neighbors
  6. XGBoost - Extreme Gradient Boost
  7. More...

Libraries Used

  1. Pandas
  2. Numpy
  3. Matplotlib
  4. Seaborn
  5. Scikit-Learn