Benchmarking Robustness of Text-Image Composed Retrieval

CIRR-C

Download:

Download original CIRR from link.

Apply image corruption on CIRR

Apply the following image corruption in dataloader can achieve corrupted images for the 15 corruptions.

import corrupt.image_corrupt as img_crpt
from PIL import Image
import numpy as np

# 15 standard corruptions
IMG_CORRUPTS=["gaussian_noise_filter", "shot_noise_filter", "impulse_noise_filter", "motion_blur_filter", "defocus_blur_filter", "zoom_blur_filter", "brightness_filter", "contrast_filter",  "pixelate_filter", "jpeg_compression", "fog" ,"snow", "frost", "glass_blur", "elastic_transform"]

corrupt = IMG_CORRUPTS[0]  # take gaussian noise filter as example
img_path = './sample.png'
image = Image.open(img_path).convert('RGB')
image = np.array(image)
img_corrupt_func = getattr(img_crpt,corrupt)
image = img_corrupt_func([image], scale=level)[0]
image = Image.fromarray(image)
image = image.save('corrupted_sample.png')

Visualization of image corruptions

image info

Brightness corruption severity

image info

Text corruption on CIRR

from corrupt.text_corrupt import *

TEXT_CORRUPTS = ['character_filter','qwerty_filter','RemoveChar_filter','remove_space_filter',  'misspelling_filter', 'repetition_filter','homophones_filter']
text_sample = 'There were two adult dogs on the road - there was one grown puppy in the yard.'
for corrupt in TEXT_CORRUPTS:
  corrupt='remove_space_filter'
  text_corrupt_func = getattr(txt_crpt,corrupt)
  corrupted_sent, levenshtein_dist = text_corrupt_func(text_sample, 3)
  print(corrupt)
  print(corrupted_sent)



##### Result:#########
character_filter
['There were two adutl dogs on teh road - there wsa oen grown puppy in the yard.']

qwerty_filter
['There were two adult dogs on the road - there was one grow5 puppy in the yard.']

RemoveChar_filter
['Thre were two adult dogs on the road - tere ws ne grown puppy in the yard.']

remove_space_filter
['There were two adult dogs onthe road - there was one grown puppy in the yard.']

misspelling_filter
['There were were two adult dogs on the road - there was one grown puppy in the yard.']

repetition_filter
['There were two adult adult dogs on the road - there was one grown grown puppy in the yard.']

homophones_filter
["They're were two adult dogs on the rowed - their was won grown puppy inn the yard."]

FashionIQ-C

Download

Download original FashionIQ from link.

Apply image corruption on FashionIQ

Same as in CIRR-C

Apply text corruption on FashionIQ

Same as in CIRR-C

CIRR-D

Download

Download from link

Samples Visualization

Numerical

  • Samples from original CIRR We sample the triplets from original CIRR dataset and categoriy them into numerical type when the modified text include number "zero" - "ten" or "number". For each triplet below, the image on the left is the reference image. According to the reference image and the modified text, we aim to retrieve the target image on the right. image info
  • Samples from Synthetic data We Further generate images based on current CIRR dev set. Our generation is based on Visual ChatGPT. image info
Attribute
  • Samples from original CIRR image info
  • Samples from Synthetic data image info
Object removal
  • Samples from original CIRR image info

  • Samples from extend caption of original CIRR image info

  • Samples from Synthetic data image info

Background
  • Samples from original CIRR image info

  • Samples from extend caption of original CIRR We select the triplets from extended captions, where background is the major change between the image pair. image info

Fine-grained

Gallery of fine-grained category is composed of image info

Testbed Requirements

conda env create -f hugface.yml