/MOMENTA

Primary LanguagePythonMIT LicenseMIT

MOMENTA

This is the repo for "MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets" accepted at Findings of EMNLP '21.

setting up dependencies

if CUDA_version == "10.0":
    torch_version_suffix = "+cu100"    
elif CUDA_version == "10.1":
    torch_version_suffix = "+cu101"    
elif CUDA_version == "10.2":
    torch_version_suffix = ""    
else:
    torch_version_suffix = "+cu110"

For installing CLIP

! pip3 install torch==1.7.1{torch_version_suffix} torchvision==0.8.2{torch_version_suffix} -f https://download.pytorch.org/whl/torch_stable.html ftfy regex --user
! wget https://openaipublic.azureedge.net/clip/bpe_simple_vocab_16e6.txt.gz -O bpe_simple_vocab_16e6.txt.gz

For sentence transformer: Follow steps from https://github.com/UKPLab/sentence-transformers

Instructions

The .py contains the exhaustive set of steps required to be run in sequence.

  1. It contains code for loading pre-saved ROI and entity features, which can be loaded if available.
  2. Otherwise the code for extracting features on-demand is also included.
  3. For initializing dataset and data loader for pytorch: Load the data-set for training and testing as per the requirement of the run.
  4. Experimental settings:
    Configurations for the binary/multi-class setting (training/testing/evaluation) has to be considered as per the requirement, code blocks for which are provided and suitably commented out.

Dataset, Features and Meta-info:

Please note: TWO versions of Harm-P data for "Harmfulness" are provided as part of this repo -- HarMeme-V0 (has duplicates in Harm-P) and HarMeme-V1 (completed set for Harm-P), respectively. We recommend using HarMeme-V1 for updated and correct version for "Harmfulness" data for US Politics category (both V0 and V1 contain original-ReadyToUse-data for Harm-C, which has Covid-19 category. While "Target" data for both categories can be found as part of HarMeme-V0 link given below.

  1. HarMeme Images
  2. HarMeme-V0: CAUTION! OBSOLETE FOR HARM-P "Harmfulness" - Contains duplicates in Harm-P. See the upgraded version (V1) below for the deduplicated version of Harm-P (Harmfulness) data. HarMeme-V0 content (including Target data) can be accessed via the following links:
  3. HarMeme-V1: Updated + Complete Version (for "Harmfulness"). For additional details about HarMeme-V1, refer the README in "HarMeme_V1" folder of this repo. Contents of "HarMeme_V1":
    • Annotations (Same format as V0: [id, image, labels, text]) - Duplicates Removed.
    • Meta-info (Collected using GCV API): Meme id, OCR Text, Web Entities, Best labels, Titles, Objects, ROI Info.

Acknowledgement: Thanks to mingshanhee and uprihtness for pointing out the discrepancies.