DIATOM - Disentangled Adversarial Neural Topic Model

This is a repository for the NAACL 2021 paper:
A Disentangled Adversarial Neural Topic Model for Separating Opinions from Plots in User Reviews

Description

This repository provides:

a PyTorch implementation of the DIATOM core architecture;
an extract of the annotated sentences from the MOBO dataset.

The MOvie and BOok reviews dataset is a collection made up of movie and book reviews, paired with their related plots. The reviews come from different publicly available datasets: the Stanford's IMDB movie reviews [1], the GoodReads [2] and the Amazon reviews dataset [3]. With the help of 15 annotators, we further labeled more than 18,000 reviews' sentences (~6000 per corpus), marking the sentence polarity (Positive, Negative), or whether a sentence describes its corresponding movie/book Plot, or none of the above (None). In the dataset folder, we have shared an excerpt of the annotated sentences for each dataset.

Further details on the data annotation process and inter-annotator agreement are available in the paper.

[1]: Learning word vectors for sentiment analysis, Maas et al., ACL11
[2]: Fine-grained spoiler detection from large-scale review corpora, Wan et al., ACL19
[3]: Image-based recommendations on styles and substitutes, McAuley et al., SIGIR15
[4]: MPST: A corpus of movie plot synopses with tags, Kar et al., LREC18

Dataset Statistics

Statistics	IMDB	GoodReads	Amazon
Number of Plots	1,131	150	100
Number of Reviews	25,836	83,852	32,375
% Pos. reviews	0.46	0.33	0.32
% Neg. reviews	0.54	0.50	0.46
% Neu. reviews	0.00	0.17	0.22
Training set	20,317	65,816	25,883
Development set	2,965	9,007	3,275
Test set	2,554	9,029	3,217
Number of annotated sent.	6,000	6,000	6,000

Requirements

Python 3.x
PyTorch >= 1.6.0
Gensim
SentenceBERT
Spacy
tqdm

Files

Current repository structure

diatom: Core architecture of the DIATOM model
mobo_dataset: Extract of the annotated sentences from the MOBO dataset

./diatom/

main_adv_vae.py: Main file for training and test procedures
adversarial_vae_model.py: DIATOM architecture and functions
vae_avitm_paper.py: Basic VAE components adopted in DIATOM
sentiment_classifier.py: Basic classifier used in the adversarial mechanism
topic_class.py: Auxiliary topic class

./mobo_dataset/

Amazon_annotated_sentences_excerpt.json
GoodReads_annotated_sentences_excerpt.json
IMDB_annotated_sentences_excerpt.json

gabrer/diatom

DIATOM - Disentangled Adversarial Neural Topic Model

Description

Dataset Statistics

Requirements

Files