The dataset provides patient english reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction. The data was obtained by crawling online pharmaceutical review sites.
The data contains 215063 reviews and it is split into a train (75%) a test (25%) partition and stored in two .tsv (tab-separated-values) files, respectively. Check data/ folder.
- drugName (categorical): name of drug
- condition (categorical): name of condition
- review (text): patient review
- rating (numerical): 10 star patient rating
- date (date): date of review entry
- usefulCount (numerical): number of users who found review useful
- Do data analysis
- Train a model to determine the rating of a review
The code must be run with Python in a jupyter notebook. The type of modeling chosen as well as the python libraries are free.