/IPOD

A Corpus of 475,000 Industrial Occupations

OtherNOASSERTION

Industrial and Professional Occupations Dataset (IPOD)

License: CC BY 4.0

This repo includes:

  • A Gazetteer of tokens and NE tags annotated by 3 domain experts
  • A Corpus of 475,085 job titles crawled from Linkedin, with NE tags prefixed using BIOES schemes
  • Title2Vec pre-trained job title embedding finetuned from ELMo. Checkpoint available for Download.

Citing IPOD

Please cite the following papers when using IPOD:

@inproceedings{liu2020ipod,
  title={IPOD: A Large-scale Industrial and Professional Occupation Dataset},
  author={Liu, Junhua and Ng, Yung Chuen and Wood, Kristin L. and Lim, Kwan Hui},
  booktitle={Proceedings of the 2020 ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (CSCW'20)},
  pages={323--328},
  year={2020}
}