Data Literacy Research

In this research project, I collaborate with Professor Sandra Cannon in measuring data literacy expectations from the ways employers describe jobs and the way they describe the people they are looking for. We find the discriminatory power between how employers describe jobs and what the actual work on the job entails.

Current Pipeline

Files:

linkedin.py This is the file you use to generated the dataframe of linkedin postings - results will be stored in data/scraping_results (tagged with "linkedin")

indeed.py Same as linkedin.py but for indeed postings - results will be stored in data/scraping_results (tagged with "indeed")

Notes

Data files:

merged_headings_df: Contains both the LinkedIn and Indeed postings in a single DataFrame

Utility Functions (in `utilities.utils`)

to_wcdf: Applies sklearn CountVectorizer
preprocess_heading_text: Takes the Heading Text, which is initially intended for merged_headings_df, and applies a preprocessing pipeline on it
visualize_counts: Takes in a Pandas series of string row entiresand visualizes using Seaborn teh top n words in that corpus
visualize_seq_lengths: Visualizes the distribution of word lengths in a sequence

mtaruno/data-literacy-research

Data Literacy Research

Current Pipeline

Files:

Notes

Utility Functions (in utilities.utils)

Utility Functions (in `utilities.utils`)