Occupation coding datasets

  1. GenEasy: A collection of 500 synthetic job listings linked to select ESCO occupation codes, crafted using GPT-4.
  2. GenHard: Identical to the above, but with job titles diverging from the textual descriptors of their respective codes.
  3. Real_indeed: A set of 100 genuine job listings sourced from Indeed, annotated manually.

Each dataset consists of columns for ID, job title, description, label, and other potential supplementary data.