Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding

The code and data used for our EMNLP paper Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding.

Requirements

GCC compiler (used to compile the source c file): See the guide for installing GCC.

Datasets

We collect in-domain corpus for embedding training. For evaluation, we use Restaurant and Laptop datasets in Sem-Eval 2015 and Sem-Eval 2016. We preprocessed these datasets in this repository.

Run the Code

Using the same datasets as ours

bash run_jasen.sh

This step runs the whole pipeline from embedding training, to neural network distillation and model evaluation. The --dataset in the script is used to specify which prepared dataset (restaurant or laptop) to use. Generated embedding file is stored under ${dataset}. Prediction results for each dataset are generated at /datasets/${dataset}/prediction.txt.

Preparing your own dataset

Create a new folder under /datasets for your new dataset. The in-domain unlabeled training corpus train.txt used for joint topic embedding training has the format of each line being a document. The test set test.txt used for evaluation is in following format:

line_id	aspect_label_id	sentiment_label_id	text