Implementation of "Folksonomication: Predicting Tags for Movies from Plot Synopses using Emotion Flow Encoded Neural Network"

Contributors

Dependencies

The code is written in Python 3.

PyTorch (Our used version is 0.4.0a0+396637c)
JSON
Joblib
Pandas
Jupyter Notebook
CUDA 8.0

Optional

Tensorflow (If you want to monitor the experiment with Tensorboard)
Acknowledgement : For using Tensorboard with PyTorch, we used the code from https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514

Resource Map

├── data (Create the directory)
├── LICENSE
├── processed_data
│   ├── all_sequence_dict.json
│   ├── class_weights.json
│   ├── class_weights_sk.json
│   ├── emotion_lexicons_dict.pkl
│   ├── idx2word_no_process.json
│   ├── index_to_tag.json
│   ├── tag_to_index.json
│   ├── test_sequences_dict.json
│   ├── test_sequences_list.json
│   ├── train_sequences_dict.json
│   ├── train_sequences_list.json
│   ├── vectors
│   │   ├── emotion_score_dict_20_chunks.json
│   │   ├── labels_binary_dict.json
│   │   └── padded_word_sequences_1500.json
│   ├── vocab_5k_no_process.json
│   └── word2idx_no_process.json
├── README.md
└── source
    ├── Dataset.py
    ├── misc.py
    ├── models.py
    ├── notebooks
    │   └── Prepare Data.ipynb
    ├── outputs (Create the directory)
    ├── predict_tags.py
    ├── report.py
    ├── tf_logger.py
    ├── TorchHelper.py
    └── train.py

Usage

Download the MPST Corpus.
Unzip the data and put the MPST directory inside the data directory.
Download the Fasttext pre-trained embeddings and edit the path in the code in the notebook.
Use the data processor notebook located at source/notebooks/Prepare Data.ipynb to prepare the data for the model. The processed data would be dumped inside processed_data directory.
After completing the processing, processed_data directory should look like below.

├── processed_data
│   ├── all_sequence_dict.json
│   ├── class_weights.json
│   ├── class_weights_sk.json
│   ├── emotion_lexicons_dict.pkl
│   ├── idx2word_no_process.json
│   ├── index_to_tag.json
│   ├── tag_to_index.json
│   ├── test_sequences_dict.json
│   ├── test_sequences_list.json
│   ├── train_sequences_dict.json
│   ├── train_sequences_list.json
│   ├── vectors
│   │   ├── emotion_score_dict_20_chunks.json
│   │   ├── labels_binary_dict.json
│   │   └── padded_word_sequences_1500.json
│   ├── vocab_5k_no_process.json
│   └── word2idx_no_process.json

For training, use the train.py. It will dump the model and logs in the output directory set in the code.

Bibtex

@InProceedings{C18-1244,
  author = 	"Kar, Sudipta
		and Maharjan, Suraj
		and Solorio, Thamar",
  title = 	"Folksonomication: Predicting Tags for Movies from Plot Synopses using Emotion Flow Encoded Neural Network",
  booktitle = 	"Proceedings of the 27th International Conference on Computational Linguistics",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"2879--2891",
  location = 	"Santa Fe, New Mexico, USA",
  url = 	"http://aclweb.org/anthology/C18-1244"
}

For any queries, please contact the first author at skar3 AT uh DOT edu

cryptexcode/folksonomication_source

Implementation of "Folksonomication: Predicting Tags for Movies from Plot Synopses using Emotion Flow Encoded Neural Network"

Contributors

Dependencies

Resource Map

Usage

Bibtex