TL;DR

This repository contains data and scripts pertaining to work done by Avijit Thawani while at Northeastern University (summer 2018) under the guidance of Dr. Byron C. Wallace (College of Computer and Information Science, Northeastern University, Boston, MA).

Link to paper

Thawani A. Paul M J. Sarkar U. Wallace B C. 
Are Online Reviews of Physicians Biased Against Female Providers? 
In Proceedings of Machine Learning Research. 106:1-17, 2019.

The paper was presented at MLHC 2019 (Machine Learning for Healthcare) Conference, Ann Arbor, Michigan. Here's a poster summarizing our work, slides from the talk and a video presentation for the same.

Please cite us and mail me at thawani@usc.edu for feedback, errors, ideas for future work, or just to say Hi!

This Repository

raw data: parsed HTML files from RateMDs.com
unclean.csv: id, review, physician specialty, physician gender, physician name, document label
processed_1.csv: review id, physician id, physician specialty, physician gender, rating staff, rating punctuality, rating helpfulness, rating knowledgeability, review text (tokenized)
all_Github.csv: physician_id.review_id, physician_id, physician name, physician specialty, physician gender, rating staff, rating punctuality, rating helpfulness, rating knowledgeability, review text
scripts: Jupyter Notebooks to reproduce our results (corresponding section from the paper in parantheses):

clean.ipynb: Data preprocessing (Section 2.1)
regression.ipynb: Rating Analysis (Section 2.2)
LR.ipynb: Lexical Regression (Section 2.3.1)
match.ipynb: Embeddings (Section 2.3.2)

Contributors

Avijit Thawani, University of Southern California (work done when interning at Northeastern in Summer 2018).
Michael J. Paul, University of Colorado Boulder.
Urmimala Sarkar, University of California San Francisco.
Byron C. Wallace, Northeastern University.

avi-jit/RateMDs

TL;DR

Link to paper

This Repository

Contributors