mindspore 1.9.0
cuda 10.1
conda (can be installed as follow)
conda env create -f environment.yml
cd your_path_to_project/scahn-ms
python train.py --config_path flickr25k.yaml
Different types of SCAHN selections are:
Loss function |
---|
paco loss |
triplet loss |
Image encoder |
---|
vision transformer |
faster-RCNN + trasnformers |
Change variables loss_type and use_raw_img(True for VIT, False for faster-RCNN + transformers) in config file to change loss function and image encoder.
If use faster-RCNN + transformers as image encoder, please provide image features and image boxes extracted by faster-RCNN, cause there's no implementation of faster-RCNN in this code.
When changing variable use_raw_img, you should change variables img_seq and img_emb_dim in the same time.
hash_bit: 16 / 32 / 64 / 128
You can download it from Baidu Netdisk:
Download F25K Dataset
You can download it from Baidu Netdisk:
Download Pretrained Model