This repository contains the data and training scripts to fit the Salary Estimator for software engineers and data scientists. The app is available HERE.
Setup the environment, and install the requierments.
conda create -n "project-salary" python=3.11 ipython
conda activate project-salary
conda install -r requirements.txt
After downloading the data and running the notebook to generate experimental_data.csv
, run the experiments, .g.:
python scripts/experiments.py scripts/example-config-exp.yml
The source data comes from several places. The salary survey data comes from levels.fyi. While the cost of living index 2022 and cost of living index by state 2022 are from Kaggle and World Population Review (WPR) respectively. You can download the later two after signing in.
variable | definition | source |
---|---|---|
timestamp | Survey time | levels |
company | Name of the company | levels |
level | Internal role level | levels |
title | Job title | levels |
tyc | Total yearly compensation (in thousands) | levels |
location | Place of work, country or state | levels |
yoe | Years of experience | levels |
yac | Years at company | levels |
base | Base salary (in thousands) | levels |
equity | Equity (in thousands) | levels |
bonus | Bonus (in thousands) | levels |
gender | Female/not-Female | levels |
coli | Cost of living index | Kaggle and WPR |
The experimental section explores two problems: estimating the expectation and estimating the range
R2 | MSE | MAE | MPL | MAPE | MEDAE | NAME | |
---|---|---|---|---|---|---|---|
LinearRegression | 0.402 | 0.27 | 0.403 | 0.202 | 0.083 | 0.331 | hot-busy-quetzal-of-sorcery |
Ridge | 0.402 | 0.27 | 0.403 | 0.202 | 0.083 | 0.331 | gigantic-mighty-ammonite-of-wholeness |
Lasso | 0.402 | 0.27 | 0.403 | 0.201 | 0.083 | 0.328 | curly-obedient-dingo-of-feminism |
ElasticNet | 0.402 | 0.27 | 0.403 | 0.202 | 0.083 | 0.33 | glistening-arrogant-mushroom-of-passion |
BayesianRidge | 0.402 | 0.27 | 0.403 | 0.202 | 0.083 | 0.331 | flawless-guppy-of-sudden-will |
DecisionTreeRegressor | 0.55 | 0.203 | 0.351 | 0.176 | 0.072 | 0.287 | placid-prophetic-seahorse-of-endeavor |
RandomForestRegressor | 0.558 | 0.2 | 0.349 | 0.174 | 0.071 | 0.286 | benevolent-eggplant-bloodhound-of-pizza |
R2 | MSE | MAE | MPL | MAPE | MEDAE | NAME | |
---|---|---|---|---|---|---|---|
QuantileRegressor | -0.261 | 0.569 | 0.608 | 0.218 | 0.121 | 0.536 | fair-victorious-nuthatch-of-inquire |
GradientBoostingRegressor | 0.223 | 0.351 | 0.476 | 0.16 | 0.092 | 0.408 | greedy-cocky-bison-of-rain |
R2 | MSE | MAE | MPL | MAPE | MEDAE | NAME | |
---|---|---|---|---|---|---|---|
QuantileRegressor | -0.395 | 0.63 | 0.595 | 0.192 | 0.133 | 0.463 | hot-kickass-guppy-of-essence |
GradientBoostingRegressor | 0.126 | 0.395 | 0.486 | 0.154 | 0.107 | 0.4 | misty-warm-nightingale-of-lightning |