Term frequency analysis of Drug Review Dataset with Apache Spark, as part of the Big Data Paralelle Programming course at Halmstad University.
NOTE There's a wierd bug running the Lemmatizing-cell, you have to execute this cell 3 times.
Read my paper about this project here.
Dataset acquired from UCI Machine Learing Repository
Surya Kallumadi
Kansas State University
Manhattan, Kansas, USA
surya '@' ksu.edu
Felix Gräßer
Institut für Biomedizinische Technik
Technische Universität Dresden
Dresden, Germany
felix.graesser '@' tu-dresden.de
Felix Gräßer, Surya Kallumadi, Hagen Malberg, and Sebastian Zaunseder. 2018. Aspect-Based Sentiment Analysis of Drug Reviews Applying Cross-Domain and Cross-Data Learning. In Proceedings of the 2018 International Conference on Digital Health (DH '18). ACM, New York, NY, USA, 121-125. DOI:Web Link
These libraries are available via pip:
- wordcloud
- nltk
- numpy
- pandas
- matplotlib