
An implementation for the paper "A Structured Self-Attentive Sentence Embedding" (ICLR 2017).

Primary LanguagePython

Sentence Embedding Matrix with Self-Attention

This repository contains a implementation for the paper A Structured Self-Attentive Sentence Embedding, which was published in ICLR 2071.


Here is the environment we need:

  • Python(3.8)
  • Pytorch(1.12.1+cuda11.3)
  • torchtext(0.6.0) (very important)
  • spacy(3.4.1)
  • en-core-web-sm(3.4.0)


We train the model on two open datasets: PAN15 Author Profiling for the task of Author Profiling and Yelp for the task of Sentiment Analysis, but only achieved close to the performance as the paper said on the Yelp dataset.

Models Yelp Age
BiLSTM + Max Pooling + MLP 61.75% 60.35%
CNN + Max Pooling + MLP 58.13% 60.69%
Our Model 63.00% 61.90%


For author profiling task:

python train.py --model_type SelfAttention --dataset Age --Age_train_path 'your trainset path' --Age_test_path 'your testset path'

And for sentiment analysis task:

python train.py --model_type SelfAttention --dataset Yelp --Yelp_train_path 'your trainset path' --Yelp_test_path 'your testset path'

Here are some useful options for the train script.

Options Optional values Meanings
--model_type SelfAttention | BiLSTM | CNN Which model we use for training
--dataset Age | Yelp Which task(dataset) we use for training
--seed The random seed.
--LSTM_hidden The out dimension of LSTM for model SelfAttention or BiLSTM
--MLP_hidden The hidden units of the classifier for the downstream application.
--aspects The hyperparameter $r$ in the paper
... ... ...

Pretrained Weigths

Here is a pretrained weight(code: 8smh) for the Yelp dataset.


For one single sentence's visualization, execute the following command:

python visualization.py --sentence 'your sentence' --muti_sentence False

And for a text file including multiple sentences as input, execute:

python visualization.py --muti_sentence True --path 'your text file path'

Some of the visualization results are as follows. The reason why the visualization is not as significant as in the paper might be that the algorithms are different. At least the visualization algorithm used in this repository is not likely to have so many dark red areas. (Do not take this seriously:-D
