This repository stores the code for the project for our school course 50.038 Computational Data Science.
Presentation can be found here.
Website preview
Topic Modelling
- Google Playstore Data.
- There are 2 files,
googleplaystore.csv
andgoogleplaystore_user_reviews.csv
.
- There are 2 files,
- Named Entity Recognition Dataset
We carried out our analysis for the Google App Store Reviews and General App Data in Jupyter Notebooks. Most of the notebooks are well-documented in what they do, you may refer to them for detailed explanation of what they do. The reason for many notebooks is that our group has chosen to work on various tasks individually.
- Basic Cleaning for General App Data :
cleaning.ipynb
- General Visualization of General App Data:
visualization_project.Rmd
- Preprocessing and Model for Reviews Data using NLTK Naive Bayes Classifier to determine sentiment polarity:
prelim_nlp_model.ipynb
- Using preprocessed data from the
prelim_nlp_model.csv
, we vectorize this data and perform cross validation of models for predicting sentiment polarity.cross_validation.ipynb
- MultinomialNB
- RandomForestClassifier
FastText_Classification.ipynb
: FastTextClassificationSVM.ipynb
: SVM
Install necessary packages
pip install -r requirements.txt
To start the website locally,
streamlit run website/main.py
The Streamlit App is deployed to Heroku, and is redeployed everytime there is a new commit to the master branch.
- The necessary packages are listed in
requirements.txt
setup.sh
is necessary for the dyno to enable the web app server to run onenableCORS=false
headless=true