ArtEmis Speaker Tools B

This repo contains following things related to [2]:

  1. User Interfaces used in human studies for MTurk Experiments
  2. Evaluation Tools
  3. Neural Speakers (nearest neighbor baseline, basic & grounded versions of M2 transformers)

Data preparation

Please, prepare annotations and detection features files for the ArtEmis dataset to run the code:

  1. Download Detection-Features and unzip it to some folder. Features are computed with the code provided by [1].
  2. Download pickle file which contains [<image_name>, <image_id>], and put it in the same folder where you have extracted detection features.
  3. Download ArtEmis dataset.
  4. Download vocabulary files 1, 2

Some bounding box visualizations for art images:

BBox Features

Environment Setup

Clone the repository and create the artemis-m2 conda environment using the environment.yml file:

conda env create -f environment.yml
conda activate artemis-m2

Then download spacy data by executing the following command:

python -m spacy download en

Training procedure

Run python train.py using the following arguments:

Argument Possible values
--exp_name Experiment name
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--m Number of memory vectors (default: 40)
--head Number of heads (default: 8)
--warmup Warmup value for learning rate scheduling (default: 10000)
--resume_last If used, the training will be resumed from the last checkpoint.
--resume_best If used, the training will be resumed from the best checkpoint.
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations
--use_emotion_labels If enabled, emotion labels will be used (default: "False")
--logs_folder Path folder for tensorboard logs (default: "tensorboard_logs")

To train grounded-version of the model, include additional parameter --use_emotion_labels=1.

python train.py --exp_name <exp_name> --batch_size 50 --m 40 --head 8 --warmup 10000 --features_path /path/to/features --annotation_folder /path/to/annotations/artemis.csv --workers 4 --logs_folder /path/to/logs/folder [--use_emotion_labels=1]

Pretrained Models

Download our pretrained models and put them under saved_models folder:

Run python test.py using the following arguments:

Argument Possible values
--batch_size Batch size (default: 10)
--workers Number of workers (default: 0)
--features_path Path to detection features file
--annotation_folder Path to folder with COCO annotations
python test.py --exp_name <exp_name> --features_path /path/to/features --annotation_folder /path/to/annotations/artemis.csv --workers 4 [--use_emotion_labels=1]

Some generations from the neural speakers:

M2 outputs

References

[1] Faster R-CNN with model pretrained on Visual Genome
[2] ArtEmis: Affective Language for Visual Art (Panos Achlioptas, Maks Ovsjanikov, Kilichbek Haydarov, Mohamed Elhoseiny, Leonidas Guibas)
[3]Meshed Memory Transformer.