Resume Parser

A Python script to scan resumes for signals of elite class status.

About
Usage
Getting Started
Contributing
Acknowledgements

About

Anonymous screening seems to be a good solution for reducing bias in hiring. However, it may not be possible to fully anonymize a resume, particularly in regards to class status (elite vs. non-elite), because class is signalled in many subtle ways. This script searches resumes for terms that signal elite class status and counts them, outputting a CSV intended to be loaded into Stata for analysis.

(back to top)

Usage

python final.py

The script will output a CSV where each row is a resume and each column is a term. The intersection of each row and column holds the number of occurences of that term (and its synonyms) in that resume.

Sample resumes can be found in this Drive folder, and a sample output is available in /sample_output/.

(back to top)

Getting Started

These instructions will get you a copy of the project up and running.

Prerequisites

Clone the repo:

git clone https://github.com/TheFirstQuestion/resume-parser.git

Install dependencies via pip / conda / mamba:
```
textract nltk tqdm pandas
```
Run setup.py to download and generate necessary files:
```
python setup.py
```

(back to top)

Running

Edit the terms lists (in /terms_of_interest) to suit your needs. Each line represents a concept, so each new term should be on a new line. Synonyms of the term should be comma-separated on the same line; their counts will be combined in the output. The terms are divided into different files for convenience only.

Edit the config section at the top of the script to suit your needs.

Variable	Usage	Suggested Value
`RESUME_DIRECTORY`	Path (relative to script location) to the directory containing the resumes.	`"./sample_resumes/"`
`TERMS_LOCATION`	Path (relative to script location) to the directory containing the CSV file(s) defining the terms of interest.	`"./terms_of_interest/"`
`OUTPUT_DIRECTORY`	Path (relative to script location) to the directory wherein the script will write the output files.	`"./output/"`
`RESUME_ID_COLUMN_NAME`	The header for the CSV column that identifies each resume.	`"resumeName"`
`SKIP_GREEK`	Should the script skip searching for all the Greek terms of interest?	`False`

Run the script.
```
python final.py
```
On the sample set of 2538 resumes, the script finishes in ~7 minutes, with the main loop running at ~6 resumes per second.

(back to top)

Contributing

Collaboration is what makes the world such an amazing place to learn, inspire, and create. Any contributions or suggestions you make are greatly appreciated!

Feel free to do any of the following:

send me an email
open an issue
fork the repo and create a pull request

(back to top)

Acknowledgements

Most of the sample resumes used in testing came from the Kaggle resume dataset, which was a super convenient resource.

(back to top)

TheFirstQuestion/resume-parser

Resume Parser

Table of Contents

About

Usage

Getting Started

Prerequisites

Running

Contributing

Acknowledgements