Primary LanguagePython


Python PyTorch

This repo is the official implementation of the paper "APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension" APL

Project structure

The directory structure of the project looks like this:

├── README.md            <- The top-level README for developers using this project.
├── config               <- configuration 
├── data
│   ├── anns            <- note: cat_name.json is for prompt template usage
├── datasets               <- dataloader file
├── models  <- Source code for use in this project.
│   │
│   ├── language_encoder.py             <- encoder for images' text descriptions 
│   ├── network_blocks.py               <- files included essential model blocks 
│   ├── tag_encoder.py                  <- encoder for extracting prompt embeddings 
│   ├── visual_encoder.py               <- visual backbone ,also includes prompt template encoder
│   │
│   │
│   ├── APL           <- most important files for APL model implementations
│   │   ├── __init__.py
│   │   ├── head.py   <- for anchor-prompt contrastive loss
|   |   ├── net.py    <- main code for APL model
│   │   ├── sup_head.py <- visual alignment loss
│   │
│   │
├── utils  <- hepler functions
├── requirements.txt     <- The requirements file for reproducing the analysis environment
│── train.py   <- script for training the model
│── test.py <- script for testing from a model
└── LICENSE              <- Open-source license if one is chosen


Instructions on how to clone and set up your repository:

Clone this repo :

  • Clone the repository and navigate to the project directory:
git clone https://github.com/Yaxin9Luo/APL.git
cd APL

Create a conda virtual environment and activate it:

conda create -n apl python=3.7 -y
conda activate apl

Install the required dependencies:

(We run all our experiments on pytorch 1.11.0 with CUDA 11.3)

(or use the following commands we copied from their offical repo)

git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Compile the DCN layer:

cd utils/DCN

Install remaining dependencies

pip install -r requirements.txt
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz

Data Preparation

  • Download images and Generate annotations according to SimREC

(We also prepared the annotations inside the data/anns folder for saving your time)

  • Download the pretrained weights of YoloV3 from Google Drive

(We recommend to put it in the main path of APL otherwise, please modify the path in config files)

  • The data directory should look like this:
├── data
│   ├── anns            
│       ├── refcoco.json            
│       ├── refcoco+.json              
│       ├── refcocog.json                 
│       ├── refclef.json
│       ├── cat_name.json       
│   ├── images 
│       ├── train2014
│           ├── COCO_train2014_000000515716.jpg              
│           ├── ...
│       ├── refclef
│           ├── 99.jpg              
│           ├── ...

... the remaining directories    
  • NOTE: our YoloV3 is trained on COCO’s training images, excluding those in RefCOCO, RefCOCO+, and RefCOCOg’s validation+testing


python train.py --config ./configs/[DATASET_NAME].yaml


python test.py --config ./config/[DATASET_NAME].yaml --eval-weights [PATH_TO_CHECKPOINT_FILE]

Model Zoo

Weakly REC

Method RefCOCO RefCOCO+ RefCOCOg
val testA testB val testA testB val-g
APL 64.51 61.91 63.57 42.70 42.84 39.80 50.22

Weakly RES

Method RefCOCO RefCOCO+ RefCOCOg
val testA testB val testA testB val-g
APL 55.92 54.84 55.64 34.92 34.87 35.61 40.13

Pesudo Labels to training other models ( Weakly Supervsied Training Schema)

Method RefCOCO RefCOCO+ RefCOCOg
val testA testB val testA testB val-g
APL_SimREC 63.94 64.72 61.21 42.11 44.85 38.31 48.35
APL_TransVG 64.86 64.89 63.87 39.28 41.08 36.45 46.11

Visualization Prediction Results (Blue box is ground truth)

Image Description : "No cut piece but 7am of cut piece"


Image Description : "Green apple on the left"


Image Description : "Purple book"


Image Description : "Yellow round fruit with blemish"


Image Description : "From bottom right second up"
