recsy justification

This is the code for our EMNLP 19' work

  • Justifying recommendations using distantly-labeled reviews and fined-grained aspects, Jianmo Ni, Jiacheng Li, Julian McAuley, Empirical Methods in Natural Language Processing (EMNLP) 2019.

This repo follows the following hierarchy:


Newly released Amazon product review dataset.

We have released a new version of the Amazon review dataset which includes more and newer reviews (i.e. reviews in the range of 2014~2018)! Welcome to play with the dataset and do interesting research!

justification classifier

This is the fine-tuned BERT model that used to train on the labeled justification data. You can simply train the model via and conduct inference over any unlabeled data using, after you change the data loader correspondingly in the python file. We also provide a pre-trained model here. - bert_config.json. - pytorch_model.bin.


This is the proposed reference2seq model. It contains files for data processing and model training/evaluation.


This is the proposed aspect-conditional masked language model (acmlm).


  • 2000 labeled data that includes a binary label for each element discourse unit (EDU) in reviews. You can find it under justification_classifier.
  • Distantly labeled dataset derived from the Yelp and Amazon Clothing dataset. Each line of the json file includes an EDU from a review and the fine-grained aspects convered in it.


  • PyTorch=0.4
  • pytorch-pretrained-bert

Please cite our paper if you find the data and code helpful, thanks!

  title={Justifying recommendations using distantly-labeled reviews and fined-grained aspects},
  author={Jianmo Ni and Jiacheng Li and Julian McAuley},