Dataset release for EMNLP_2019 paper titled "Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts".
-
- We perform experiments on 5 random splits in our paper. These splits are present in folder
splits_new
with first part of the name specifying whether it is a train or validation split. For exampletrain_split_0.json
andval_split_0.json
refer to the first train and val splits respectively. - The final performance is reported by averaging the performance across these splits. For details refer to the paper.
- Each of these split files is a json containing lists. Each element of the list has following keys:
filename
: filename of the image/featuresemiotic
: label for semiotic categoryintent
: label for intent categoryimgTxt
: label for contexual categorycaption
: cleaned caption of the posttags
: hashtags (we did not use these in the paper)url
: original url of the image (please see comment below)likes
: no of likesorig_caption
: orig caption (with emoticons etc.)
- We perform experiments on 5 random splits in our paper. These splits are present in folder
-
- Due to recent restrictions on releasing image data collected from social media websites (Instagram in our case), we are unable to release the original images. We only release the deep features for each image (last layer of ResNet-18) as used in our paper for the purpose of reproducing the results and conducting additional experiments.
- We are still trying to find ways to release the images or working urls for these images (original urls are provided but most of them are expired).
- These features are available in tar
resnet18_feat.tar
. Each feature is a.npy
containing the image feature and are named using thefilename
key in the splits - Reproducing results: In the paper we trained only the embedding and classification layers and did not fine-tune the Resnet-18 network. The only difference when using the provided features is the we used random crops for training. To allow a fair comparison with these features for future work, we report the AUC performance of the model using images for the three taxonomies below
| | Intent | Semiotic | Contextual |
|---------------- |-------- |---------- |------------ |
| Img | 73.8 | 58.8 | 62.5 |
| Img + Txt-Emb | 81.4 | 69.9 | 76.2 |
| Img + Txt-ELMo | 85.3 | 69.1 | 78.8 |
The results are quite close to those reported in the paper.
- We give list of the labels for the three taxonomies in the folder
labels
. Note that we useimgTxt_labels
to refer to Contextual relationship (as used in the paper).
Please cite the following paper if you are using this dataset for you research.
@article{kruk2019integrating,
title={Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts},
author={Kruk, Julia and Lubin, Jonah and Sikka, Karan and Lin, Xiao and Jurafsky, Dan and Divakaran, Ajay},
journal={arXiv preprint arXiv:1904.09073},
year={2019}
}
Check Link for project webpage and video about this work.
Please email ajay.divakaran@sri.com or karan.sikka@sri.com (karan.sikka1@gmail.com) for any questions regarding the dataset.