/BDPP-Project

Term frequency analysis of Drug Review Dataset with Apache Spark

Primary LanguageJupyter Notebook

BDPP-Project

Term frequency analysis of Drug Review Dataset with Apache Spark, as part of the Big Data Paralelle Programming course at Halmstad University.

NOTE There's a wierd bug running the Lemmatizing-cell, you have to execute this cell 3 times.

Read my paper about this project here.

Dataset

Dataset acquired from UCI Machine Learing Repository

Source

Surya Kallumadi

Kansas State University

Manhattan, Kansas, USA

surya '@' ksu.edu


Felix Gräßer

Institut für Biomedizinische Technik

Technische Universität Dresden

Dresden, Germany

felix.graesser '@' tu-dresden.de

Relevant paper

Felix Gräßer, Surya Kallumadi, Hagen Malberg, and Sebastian Zaunseder. 2018. Aspect-Based Sentiment Analysis of Drug Reviews Applying Cross-Domain and Cross-Data Learning. In Proceedings of the 2018 International Conference on Digital Health (DH '18). ACM, New York, NY, USA, 121-125. DOI:Web Link

Dependencies

These libraries are available via pip:

  • wordcloud
  • nltk
  • numpy
  • pandas
  • matplotlib