A faster and up to date implementation is in my other repo
Batched implementation of Hierarchical Attention Networks for Document Classification paper
- Pytorch (>= 0.2)
- Spacy (for tokenizing)
- Gensim (for building word vectors)
- tqdm (for fancy graphics)
prepare_data.py
transforms gzip files as found on Julian McAuley Amazon product data page to lists of(user,item,review,rating)
tuples and builds word vectors if--create-emb
option is specified.main.py
trains a Hierarchical Model.Data.py
holds data managing objects.Nets.py
holds networks.beer2json.py
is an helper script if you happen to have the ratebeer/beeradvocate datasets.
The whole dataset is used to create word embeddings which can be an issue.