Implementation in Tensorflow of A Structured Self-attentive Sentence Embedding with the sentiment analysis task.
The code organization is similar to NeuroNER.
SelfSent relies on Python 3.5 and TensorFlow 1.0+.
For the package stanford_corenlp_pywrapper
, just install it from https://github.com/mpagli/stanford_corenlp_pywrapper.
You need to create a data
folder next to the src
folder.
Then for each dataset, you have to create a separate folder. In this folder, you just need to put a file all.json
where each line correspond to a json sample with its attributes.
Here a sample of Yelp dataset:
{"review_id":"IYE_M_cRsk-AhVYeYvnADg","user_id":"r-zUIQPaHzvIyL93wQaoiQ","business_id":"HE23DlZWAO_JF1VIHA60TQ",**"stars":3**,"date":"2012-10-09",**"text":"This is the Capitol Square branch."**,"useful":0,"funny":0,"cool":0,"type":"review"}
In the case of review star prediction, the needed attributes are text
and stars
.
{"review_id":"IYE_M_cRsk-AhVYeYvnADg","stars":3,"date":"2012-10-09","text":"This is the Capitol Square branch.","user_id":"r-zUIQPaHzvIyL93wQaoiQ","business_id":"HE23DlZWAO_JF1VIHA60TQ""useful":0,"funny":0,"cool":0,"type":"review"}
The parameters do_split
(force to split even if the pickle files exist), training
, valid
, test
in parameters.ini
will split the dataset accordingly.
It also needs some word embeddings, which should be downloaded from http://neuroner.com/data/word_vectors/glove.6B.100d.zip, unzipped and placed in /data/word_vectors
. This can be done on Ubuntu and Mac OS X with:
# Download some word embeddings
mkdir -p SelfSent-master/data/word_vectors
cd SelfSent-master/data/word_vectors
wget http://neuroner.com/data/word_vectors/glove.6B.100d.zip
unzip glove.6B.100d.zip
Be sure that use_pretrained_model = false
and have at least all.json
in the data folder.
You need to have a pretrained model, which is composed of:
- dataset.pickle
- model.ckpt.data-00000-of-00001
- model.ckpt.index
- model.ckpt.meta
- parameters.ini
Don't forget to put use_pretrained_model = true
and the path to the pretrained model folder.
Don't hesitate to contact for any feedback or create issues/pull requests.