/job_ad_soc_codes

Labeling job ads with standard occupational classification (SOC) codes

Primary LanguageJupyter Notebook

job_ad_soc_codes

This is a personal project inspired by my previous job in workforce development. The objective is to label job ads with standard occupational classification (SOC) codes—ideally using an embedding of job ads that could be useful for other tasks, such as identifying emerging occupations not well-described by the existing SOC system.

My preliminary efforts achieve 63.7% accuracy so far, using a combination of latent semantic analysis (LSA), smooth inverse frequency (SIF) averaging of GloVe embeddings, metric learning with neighborhood components analysis (NCA), and error-correcting output codes (ECOC) arranged as a boosting ensemble of logistic regression classifiers. This is a substantial improvement over the 0.1% that might be achieved by random guessing. This is also better than the 39.3% accuracy demonstrated by Nicholas Thiebaut by fitting a more complicated model with more samples (but less detailed data: titles only vs. descriptions/ads).