Official Implementation for AAAI 2024 paper VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting
Update
🔥🔥🔥 [Dec 9] Our paper is accepted by AAAI 2024.
🔥🔥🔥 [Dec 28] Code and pretrained model are released.
In our project, the following datasets are used. Please visit the following links to download datasets:
We use CARPK and PUCPR+ by importing the hub package. Please click here for more information.
/
├─VLCounter/
│
├─FSC147/
│ ├─gt/
│ ├─image/
│ ├─ImageClasses_FSC147.txt
│ ├─Train_Test_Val_FSC_147.json
│ ├─annotation_FSC147_384.json
│
├─IOCfish5k/
│ ├─annotations/
│ ├─images/
│ ├─test_id.txt/
│ ├─train_id.txt/
│ ├─val_id.txt/
The following packages are suitable for NVIDIA GeForce RTX A6000.
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install hub
If you want to use the docker environment, please download the docker image through the command below
docker pull sgkang0305/vlcounter
Please download the CLIP pretrained weight and locate the file under the "pretrain" folder.
Please download the BPE file and locate the file under the "tools/dataset" folder.
You can train the model using the following command. Make sure to check the options on the train.sh file.
bash scripts/train.sh FSC {gpu_id} {exp_number}
You can test the performance of trained ckpt with the following command. Make sure to check the options in the test.sh file. Especially '--ckpt_used' to specify the specific weight file.
bash scripts/test.sh FSC {gpu_id} {exp_number}
We provide a pre-trained ckpt of our full model, which has similar quantitative result as presented in the paper.
FSC val MAE | FSC val RMSE | FSC test MAE | FSC test RMSE |
---|---|---|---|
18.06 | 65.13 | 17.05 | 106.16 |
CARPK MAE | CARPK RMSE | PUCPR+ MAE | PUCPR+ RMSE |
---|---|---|---|
6.46 | 8.68 | 48.94 | 69.08 |
Consider citing us if you find our paper useful in your research :).
@inproceedings{kang2024vlcounter,
title={VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting},
author={Kang, Seunggu and Moon, WonJun and Kim, Euiyeon and Heo, Jae-Pil},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={3},
pages={2714--2722},
year={2024}
}
This project is based on implementation from CounTR.