/VLCounter

[AAAI 2024] VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

Primary LanguagePythonMIT LicenseMIT

VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

PWC PWC

teaser

Official Implementation for AAAI 2024 paper VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

Update

🔥🔥🔥 [Dec 9] Our paper is accepted by AAAI 2024.

🔥🔥🔥 [Dec 28] Code and pretrained model are released.

Contents

Preparation

1. Download datasets

In our project, the following datasets are used. Please visit the following links to download datasets:

We use CARPK and PUCPR+ by importing the hub package. Please click here for more information.

/
├─VLCounter/
│
├─FSC147/    
│  ├─gt/
│  ├─image/
│  ├─ImageClasses_FSC147.txt
│  ├─Train_Test_Val_FSC_147.json
│  ├─annotation_FSC147_384.json
│  
├─IOCfish5k/
│  ├─annotations/
│  ├─images/
│  ├─test_id.txt/
│  ├─train_id.txt/
│  ├─val_id.txt/

2. Download required Python packages:

The following packages are suitable for NVIDIA GeForce RTX A6000.

pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install hub

If you want to use the docker environment, please download the docker image through the command below

docker pull sgkang0305/vlcounter

3. Download CLIP weight and Byte pair encoding (BPE) file

Please download the CLIP pretrained weight and locate the file under the "pretrain" folder.

Please download the BPE file and locate the file under the "tools/dataset" folder.

Run the Code

Train

You can train the model using the following command. Make sure to check the options on the train.sh file.

bash scripts/train.sh FSC {gpu_id} {exp_number}

Evaluation

You can test the performance of trained ckpt with the following command. Make sure to check the options in the test.sh file. Especially '--ckpt_used' to specify the specific weight file.

bash scripts/test.sh FSC {gpu_id} {exp_number}

We provide a pre-trained ckpt of our full model, which has similar quantitative result as presented in the paper.

FSC val MAE FSC val RMSE FSC test MAE FSC test RMSE
18.06 65.13 17.05 106.16
CARPK MAE CARPK RMSE PUCPR+ MAE PUCPR+ RMSE
6.46 8.68 48.94 69.08

Visualization

more

Citation

Consider citing us if you find our paper useful in your research :).

@inproceedings{kang2024vlcounter,
  title={VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting},
  author={Kang, Seunggu and Moon, WonJun and Kim, Euiyeon and Heo, Jae-Pil},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={3},
  pages={2714--2722},
  year={2024}
}

Acknowledgements

This project is based on implementation from CounTR.