Our dataset with automatically generated out-of-context image-caption pairs in the news media. For inquiries and requests, please contact graceluo@berkeley.edu.
Make sure you are running Python 3.6+.
- Request the VisualNews Dataset.
Place the files under the
visual_news
folder. - Run
./download.sh
to download our matches and populate thenews_clippings
folder (place intonews_clippings/data/
). - Consider doing analyses of your own using the embeddings we have provided (place into
news_clippings/embeddings/
).
All of the ids and image paths provided in our data/
folder exactly correspond to those listed in the data.json
file in VisualNews.
Your file structure should look like this:
news_clippings
│
└── data/
└── embeddings/
visual_news
│
└── origin/
│ └── data.json
│ ...
└── ...
The data is ordered such that every even sample is pristine, and the next sample is its associated falsified sample.
id
: the id of the VisualNews sample associated with the captionimage_id
: the id of the VisualNews sample associated with the imagesimilarity_score
: the similarity measure used to generate the sample (i.e.clip_text_image, clip_text_text, sbert_text_text, resnet_place
)falsified
: a binary indicator if the caption / image pair was the original pair in VisualNews or a mismatch we generatedsource_dataset
(Merged / Balanced only): the index of the sub-split name insource_datasets
Here's an example of how you can start using our matches:
import json
visual_news_data = json.load(open("visualnews/origin/data.json"))
visual_news_data_mapping = {ann["id"]: ann for ann in visual_news_data}
data = json.load(open("news_clippings/data/merged_balanced/val.json"))
annotations = data["annotations"]
ann = annotations[0]
caption = visual_news_data_mapping[ann["id"]]["caption"]
image_path = visual_news_data_mapping[ann["image_id"]]["image_path"]
print("Caption: ", caption)
print("Image Path: ", image_path)
print("Is Falsified: ", ann["falsified"])
We include the following precomputed embeddings:
clip_image_embeddings
: 512-dim image embeddings from CLIP ViT-B/32.
Contains embeddings for samples in all splits.clip_text_embeddings
: 512-dim caption embeddings from CLIP ViT-B/32.
Contains embeddings for samples in all splits.sbert_embeddings
: 768-dim caption embeddings from SBERT-WK.
Contains embeddings for samples in all splits.places_resnet50
: 2048-dim image embeddings using ResNet50 trained on Places365.
Contains embeddings only for samples in thescene_resnet_place
split (where [PERSON] entities were not detected in the caption).
The following embedding types were not used in the construction of our dataset, but you may find them useful.
facenet_embeddings
: 512-dim embeddings for each face detected in the images using FaceNet. If no faces were detected, returnsNone
.
Contains embeddings only for samples in theperson_sbert_text_text
split (where [PERSON] entities were detected in the caption).
All embeddings are dictionaries of {id: numpy array} stored in pickle files for train / val / test. You can access the features for each image / caption by its id like so:
import pickle
clip_image_embeddings = pickle.load(open("news_clippings/embeddings/clip_image_embeddings/test.pkl", "rb"))
id = 701864
print(clip_image_embeddings[id])
We have additional metadata available upon request, such as the spaCy and REL named entities, timestamp, location of the original article content, etc.
We also have sbert_embeddings_dissecting
, which has an embedding for each token and its weighting from running the "dissecting" setting of SBERT-WK, available upon request.
To run the benchmarking experiments we reported in our paper, look at the README for news_clippings_training/
.
If you find our dataset useful for your research, please, cite the following paper:
@article{luo2021newsclippings,
title={NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media},
author={Luo, Grace and Darrell, Trevor and Rohrbach, Anna},
journal={arXiv:2104.05893},
year={2021}
}