Tensorflow code of VAL model

Chen et al. Image Search with Text Feedback by Visiolinguistic Attention Learning. CVPR2020

Getting Started



(1) Download ImageNet pretrained models: mobilenet and resnet, which should be put under the directory pretrain_model.

(2) Follow steps in scripts/prepare_data.sh to prepare datasets. Note: fashion200k and shoes can be downloaded manually. Relevant py files for data preparation are detailed below.

  • download_fashion_iq.py: crawl the image data from Amazon websites. Note that some url links might be broken.
  • generate_groundtruth.py: generate some .npy files that charaterize the groundtruth annotations during test time.
  • read_glove.py: prepare the pre-trained glove word embeddings to initialize the text model (i.e. LSTM).

Running Experiments

Training & Testing:

Train and test the VAL model on different datasets in one script file as follows.

bash scripts/run_fashion200k.sh
bash scripts/run_fashion_iq.sh
bash scripts/run_shoes.sh

The test results will be finally reported in results/results_fashion_iq.log.

Our implementation include the following .py files. Note that fashion200k is formated differently compared to fashion_iq or shoes, as a triplet of source image, text and target image is not pre-given, but is instead sampled randomly during training. Therefore, there are two implementation to build and run the training graph.

  • train_val.py: build and run the training graph on dataset fashion_iq or shoes.
  • train_val_fashion200k.py: build and run the training graph on dataset fashion200k.
  • model.py: define the model and losses.
  • config.py: define image preprocessing and other configurations.
  • extract_features_val.py: extract features from the model.
  • test_val.py: compute distance, perform retrieval, and report results in the log file.


