Implementation of "Folksonomication: Predicting Tags for Movies from Plot Synopses using Emotion Flow Encoded Neural Network"
The code is written in Python 3.
- PyTorch (Our used version is 0.4.0a0+396637c)
- JSON
- Joblib
- Pandas
- Jupyter Notebook
- CUDA 8.0
Optional
- Tensorflow (If you want to monitor the experiment with Tensorboard)
- Acknowledgement : For using Tensorboard with PyTorch, we used the code from https://gist.github.com/gyglim/1f8dfb1b5c82627ae3efcfbbadb9f514
├── data (Create the directory)
├── LICENSE
├── processed_data
│ ├── all_sequence_dict.json
│ ├── class_weights.json
│ ├── class_weights_sk.json
│ ├── emotion_lexicons_dict.pkl
│ ├── idx2word_no_process.json
│ ├── index_to_tag.json
│ ├── tag_to_index.json
│ ├── test_sequences_dict.json
│ ├── test_sequences_list.json
│ ├── train_sequences_dict.json
│ ├── train_sequences_list.json
│ ├── vectors
│ │ ├── emotion_score_dict_20_chunks.json
│ │ ├── labels_binary_dict.json
│ │ └── padded_word_sequences_1500.json
│ ├── vocab_5k_no_process.json
│ └── word2idx_no_process.json
├── README.md
└── source
├── Dataset.py
├── misc.py
├── models.py
├── notebooks
│ └── Prepare Data.ipynb
├── outputs (Create the directory)
├── predict_tags.py
├── report.py
├── tf_logger.py
├── TorchHelper.py
└── train.py
- Download the MPST Corpus.
- Unzip the data and put the MPST directory inside the data directory.
- Download the Fasttext pre-trained embeddings and edit the path in the code in the notebook.
- Use the data processor notebook located at source/notebooks/Prepare Data.ipynb to prepare the data for the model. The processed data would be dumped inside processed_data directory.
- After completing the processing, processed_data directory should look like below.
├── processed_data
│ ├── all_sequence_dict.json
│ ├── class_weights.json
│ ├── class_weights_sk.json
│ ├── emotion_lexicons_dict.pkl
│ ├── idx2word_no_process.json
│ ├── index_to_tag.json
│ ├── tag_to_index.json
│ ├── test_sequences_dict.json
│ ├── test_sequences_list.json
│ ├── train_sequences_dict.json
│ ├── train_sequences_list.json
│ ├── vectors
│ │ ├── emotion_score_dict_20_chunks.json
│ │ ├── labels_binary_dict.json
│ │ └── padded_word_sequences_1500.json
│ ├── vocab_5k_no_process.json
│ └── word2idx_no_process.json
- For training, use the train.py. It will dump the model and logs in the output directory set in the code.
@InProceedings{C18-1244,
author = "Kar, Sudipta
and Maharjan, Suraj
and Solorio, Thamar",
title = "Folksonomication: Predicting Tags for Movies from Plot Synopses using Emotion Flow Encoded Neural Network",
booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "2879--2891",
location = "Santa Fe, New Mexico, USA",
url = "http://aclweb.org/anthology/C18-1244"
}
- For any queries, please contact the first author at skar3 AT uh DOT edu