conda env create
source activate open_clip
# Additional packages
pip install opencv-python
cd open_clip
export PYTHONPATH="$PYTHONPATH:$PWD/src"
-
Download the json files from the link provided in the ALBEF repo - https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/json_pretrain.zip
-
Go to line number
29
insrc/scripts/json_preprocess.py
and provide paths to the json files downloaded in step 1. -
Run the following command. This will create a csv file
json_data.csv
and puts it in the folder namedcsvs
.
python src/scripts/json_preprocess.py
The following commands will create csv files csvs/val_coco.csv
and csvs/train_coco.csv
which we will use for training and validation.
python src/scripts/coco_preprocess.py --split train --data-root /path/to/coco/dataset/
python src/scripts/coco_preprocess.py --split val --data-root /path/to/coco/dataset/
python -u src/training/main.py \
--train-data="/path/to/json_data.csv" \
--val-data="/path/to/val_coco.csv" \
--warmup 10000 \
--batch-size=128 \
--lr=5e-4 \
--wd=0.1 \
--epochs=30 \
--workers=4 \
--model RN50
Logging will be done on wandb