LARA-1: A Jupyter Notebook repository from jimmy-feng

This is a python implementation of following research paper from Wang et al. 

Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach
by Hongning Wang, Yue Lu, Chengxiang Zhai. The 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'2010), p783-792, 2010.

This implementation uses the previous implementation given by the author (http://www.cs.virginia.edu/~hw5x/Codes/LARA.zip).

Latent Aspect Rating Analysis aims at analyzing opinions expressed about an entity in an online review at the level of topical aspects to discover each individual reviewer’s latent opinion on each aspect as well as the relative emphasis on different aspects
when forming the overall judgment of the entity. It uses probabilistic rating regression to solve this problem.

Contributors:
https://github.com/hemantverma1
https://github.com/kanishtha

Organization of the code:
There are two classes Sentence and Review each coded in different python files. These act as data containers for a sentence and a single review respectively. 
ReadData contains all functions for processing the reviews. BootStrap class contains the bootstrapping algorithm as explained in the paper. LRR class contains the implementation of Rating Regression algorithm as described in the paper. 
We had initially coded as ipython notebooks so .ipynb files also exist.
* hotelReviews directory is where the review files go (json encoded) - both Training and Testing data
* settings directory contains the configuration files for the model
* modelData will contain the files generated by the model

How to run:
Have nltk and scipy installed. Also download modules for nltk-stopwords and porter-stemmer. 

For preprocessing reviews:
python3 BootStrap.py

For running the main model:
python3 LRR.py
jimmy-feng/LARA-1