Benchmark-Robustness-Text-Image-Compose-Retrieval: A Python repository from SunTongtongtong

Benchmarking Robustness of Text-Image Composed Retrieval

CIRR-C

Download:

Download original CIRR from link.

Apply image corruption on CIRR

Apply the following image corruption in dataloader can achieve corrupted images for the 15 corruptions.

import corrupt.image_corrupt as img_crpt
from PIL import Image
import numpy as np

# 15 standard corruptions
IMG_CORRUPTS=["gaussian_noise_filter", "shot_noise_filter", "impulse_noise_filter", "motion_blur_filter", "defocus_blur_filter", "zoom_blur_filter", "brightness_filter", "contrast_filter",  "pixelate_filter", "jpeg_compression", "fog" ,"snow", "frost", "glass_blur", "elastic_transform"]

corrupt = IMG_CORRUPTS[0]  # take gaussian noise filter as example
img_path = './sample.png'
image = Image.open(img_path).convert('RGB')
image = np.array(image)
img_corrupt_func = getattr(img_crpt,corrupt)
image = img_corrupt_func([image], scale=level)[0]
image = Image.fromarray(image)
image = image.save('corrupted_sample.png')

Visualization of image corruptions

Brightness corruption severity

Text corruption on CIRR

from corrupt.text_corrupt import *

TEXT_CORRUPTS = ['character_filter','qwerty_filter','RemoveChar_filter','remove_space_filter',  'misspelling_filter', 'repetition_filter','homophones_filter']
text_sample = 'There were two adult dogs on the road - there was one grown puppy in the yard.'
for corrupt in TEXT_CORRUPTS:
  corrupt='remove_space_filter'
  text_corrupt_func = getattr(txt_crpt,corrupt)
  corrupted_sent, levenshtein_dist = text_corrupt_func(text_sample, 3)
  print(corrupt)
  print(corrupted_sent)



##### Result:#########
character_filter
['There were two adutl dogs on teh road - there wsa oen grown puppy in the yard.']

qwerty_filter
['There were two adult dogs on the road - there was one grow5 puppy in the yard.']

RemoveChar_filter
['Thre were two adult dogs on the road - tere ws ne grown puppy in the yard.']

remove_space_filter
['There were two adult dogs onthe road - there was one grown puppy in the yard.']

misspelling_filter
['There were were two adult dogs on the road - there was one grown puppy in the yard.']

repetition_filter
['There were two adult adult dogs on the road - there was one grown grown puppy in the yard.']

homophones_filter
["They're were two adult dogs on the rowed - their was won grown puppy inn the yard."]

FashionIQ-C

Download

Download original FashionIQ from link.

Apply image corruption on FashionIQ

Same as in CIRR-C

Apply text corruption on FashionIQ

Same as in CIRR-C

CIRR-D

Download

Download from link

Samples Visualization

Numerical

Samples from original CIRR We sample the triplets from original CIRR dataset and categoriy them into numerical type when the modified text include number "zero" - "ten" or "number". For each triplet below, the image on the left is the reference image. According to the reference image and the modified text, we aim to retrieve the target image on the right.
Samples from Synthetic data We Further generate images based on current CIRR dev set. Our generation is based on Visual ChatGPT.

Attribute

Samples from original CIRR
Samples from Synthetic data

Object removal

Samples from original CIRR
Samples from extend caption of original CIRR
Samples from Synthetic data

Background

Samples from original CIRR
Samples from extend caption of original CIRR We select the triplets from extended captions, where background is the major change between the image pair.

Fine-grained

Gallery of fine-grained category is composed of

Testbed Requirements

conda env create -f hugface.yml

SunTongtongtong/Benchmark-Robustness-Text-Image-Compose-Retrieval

Benchmarking Robustness of Text-Image Composed Retrieval

CIRR-C

Download:

Apply image corruption on CIRR

Visualization of image corruptions

Brightness corruption severity

Text corruption on CIRR

FashionIQ-C

Download

Apply image corruption on FashionIQ

Apply text corruption on FashionIQ

CIRR-D

Download

Samples Visualization

Numerical

Attribute

Object removal

Background

Fine-grained

Testbed Requirements