Usage
This repo is inherited from https://github.com/li-xirong/w2vvpp and https://github.com/danieljf24/dual_encoding
Requirements
- Ubuntu 16.04
- cuda 10
- python 2.7.12
- conda
- PyTorch 1.2.0
- tensorboard 1.14.0
- numpy 1.16.4
- keras
- tensorflow
Set up environment
Change environment variables to desired folder and create folder storing dataset(VisualSearch)
export HOME=/path/to/desired/folder
cd $HOME
mkdir VisualSearch
git clone https://github.com/0902338471/W2VV.git
conda create -n W2VV python=2.7
conda activate W2VV
pip install -r ~/W2VV/w2vvpp/requirements.txt
Extract features with ResNet152
1.Run following code, replace ${your_data_name} variable by your own data name
mkdir ~/${your_data_name}
mkdir ~/VisualSearch/${your_data_name}/
mkdir ~/VisualSearch/${your_data_name}/FeatureData/
mkdir ~/VisualSearch/${your_data_name}/TextData/
- Download all your dataset inside folder
~/W2VV/DATASET/{train/val/test}/${your_data_name}
.(storing images and captions data in separate subfolder) - Copying image caption file with format: [image-name] [text_catption] inside folder
~/VisualSearch/${data_name}/TextData/${data_name}.caption.txt/
- Run following code, replace
${image_folder}
${output_features_name}
with your folder image dataset and desired txt file storing extracted features respectively
python resnext_152_extract.py --data_path ${image_folder} -- feature_path ${output_features_name}.txt
Convert features txt file to bin file format
Run following code, replace ${output_features_name}
and ${data_name}
python txt2bin.py 1000 ~/W2VV/${output_features_name} 0 ~/VisualSearch/${data_name}/FeatureData/mean_resnext101_resnet152
After previous steps, your dataset folder will have following format
${your_data_name}
├── FeatureData
│ └── mean_resnext101_resnet152
│ ├── feature.bin
│ ├── shape.txt
│ └── id.txt
└── TextData
└── ${your_data_name}.caption.txt
FeatureData
: extracted image feature.feature.bin
: extracted features in binary format${your_data_name}.caption.txt
: caption data. The file structure is as follows, in which the image and sentence in the same line are relevant.
image_id_1#1 sentence_1
image_id_1#2 sentence_2
...
image_id_n#1 sentence_k
...
Training
Building vocabulary for caption file
Run following code
cd ~/W2VV/w2vvpp
./do_build_vocab.sh ${data_name}