This is the codebase for GPV 2 from our paper Webly Supervised Concept Expansion for General Purpose Vision Models. Code for the web 10k dataset is in a separate repo.
Clone the repo with --recurse-submodules
git clone git@github.com:allenai/gpv2.git --recurse-submodules
Create conda environment
conda create -n gpv2 python=3.6 -y
conda activate gpv2
Install pytorch, I have been using pytorch 1.8.1, other versions might work but are not tested. For example on linux:
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=11.2 -c pytorch -c conda-forge
but you might need to change that command depending on your operating system/gpu setup.
Finally, install libraries:
conda install -c cyclus java-jdk=8.45.14 -y
pip3 install -r requirements.txt
Download data for COCO, DCE, and Web as well as the pre-computed VinVL features for these datasets (note you cannot use the VinVL features provided by the VinVL authors since we need features for the cropped images and for the all-image boxes):
python gpv2/download_data.py
The data is saved in the locations found in file_paths.py
, by default source data is saved
into ~/data/gpv while the features are stored in ./data-cache/precomputed-features/vinvl.
The command lines args for the script can download particular subsets if you don't need
everything.
We have currently released three GPV 2 models:
- With web: s3://ai2-prior-gpv/public/gpv2-models/gpv2
- Without web: s3://ai2-prior-gpv/public/gpv2-models/gpv2-noweb
- CC pre-training only (not fine-tuning): s3://ai2-prior-gpv/public/gpv2-models/cc-pretrained/
To download, use aws s3 cp with --recursive:
mkdir -p models
aws s3 cp --recursive s3://ai2-prior-gpv/public/gpv2-models/gpv2 models/gpv2
The repo is currently setup to train the basic model on COCO data, training with web data will be added we complete the release process.
To train on devices 0 and 1 of your machine without web data:
python gpv2/experiments/train_gpv2.py --device 0 1 --task all --output_dir /path/to/output/dir
For debugging purposes I recommend using the --debug flag and reducing the number of devices and workers to 0 which will get you much faster startup times and better error messages:
python gpv2/experiments/train_gpv2.py --device 0 1 --task all --output_dir /path/to/output/dir --debug small
which will run the model on a small sample of the data and without complicated distributed training.
To run from our CC pre-trained checkpoint, download the cc-pretrained model and use the --init_from
flag
python gpv2/experiments/train_gpv2.py --device 0 1 --task all --output_dir /path/to/output/dir --init_from models/cc-pretrained/r0/state-ep8.pth
Run on a single image using run_on_image_id
python gpv2/eval/run_on_image_id.py model/gpv2 dce/test/nocaps/0003d84e0165d630.jpg "What is this?"
Here "What is this?" is the prompt and dce/test/nocaps/0003d84e0165d630.jpg
is an image_id, not a filepath, that
can be used to look up the needed VinVL features in the HDF5 feature files.
Look at GpvDataset
or DceDataset
to see the format of the image_ids.
To compute predictions for a dataset, use:
python gpv2/eval/compute_topn_predictions.py models/gpv2 --datasets dce-vqa --part val --eval --output_name default
The predictions for VQA will saved to models/gpv2/r0/eval/{dataest-name}--default, an evaluation file with the results will be saved there as eval.json.
See the command line flags on compute_topn_predictions to run on other datasets, or use multiple GPUs.
The script gpv2/eval/build_sumbmission_files.py
will construct the submissions files needs to evaluate on the VQA test, COCO test
and nocaps val/test server assuming the needed predictions for those models have already been saved
using compute_topn_predictions.py
.
GPV-2 uses VinVL pre-computed image features. If you want to run the model on a new dataset, you will need to pre-computed the image features for that dataset. We provide our script for doing this and getting results in a HDF5 file we can use, the results are compatibile with the ones produced by here. There are three steps to doing this:
-
Gather your images into one directory, it may include subdirectories, but it should not contain any files other than images.
-
Run:
**python gpv2/build_image_features/precompute_image_features.py /path/to/image_directory your_dataset_name --output features.hdf5**
where
/path/to/image_directory
should point to your image directory andyour_dataset_name
should be a name for the set of images you are adding. The script has parameters to control the batch size and run across multiple devices which can be used to tune the process. This will produce the hdf5 file vinvl.hdf5. -
Move the hdf5 file to
file_paths.PRECOMPUTED_FEATURES_DIR
under the vinvl directory, for example:mkdir -p data-cache/precomputed-features/vinvl mv features.hdf5 data-cache/precomputed-features/your_dataset_name/your_dataset_name.hdf5
Now the model will support image_ids with the format of your_dataset_name/path/to/image_file/in/your/directory
.
For example, if your directory contained the
image val/dog/001.jpg and your dataset_name was "pets", the image_id "pets/val/001.jpg" will
now be recognized by the model and load the pre-computed features for that image. Image ids of that format
can now be passed torun_on_image_id.py
or used in GPVExample
objects with VinVL models.
Features for the web/coco/dce datasets can be re-computed using gpv2/build_image_features/precompute_dataset_features.py, but by default download_data will download them automatically.