Paper: https://arxiv.org/abs/2304.00450
Before you begin, ensure you have the following dependencies installed on your system:
cuda == 10.2
torch == 1.8.0
torchvision == 0.9.0
python == 3.8.11
numpy == 1.20.3
All experiments can be conducted with single RTX 3090 (but not limited to)
To get started, follow these steps:
- Clone the SVOL repository from GitHub:
git clone https://github.com/sangminwoo/SVOL.git
cd SVOL
- Install the required Python packages listed in requirements.txt:
pip install -r requirements.txt
SVOL uses multiple datasets for training and evaluation. Ensure you have the following datasets ready:
- QuickDraw
- Sketchy
- TU-Berlin
- ImageNet-VID (3862/555/1861)
For the ImageNet-VID dataset, organize the data as follows:
- In the
ILSVRC/Annotations/VID/train/
directory, move all files fromILSVRC2015_VID_train_0000
,ILSVRC2015_VID_train_0001
,ILSVRC2015_VID_train_0002
, andILSVRC2015_VID_train_0003
to the parent directory. - In the
ILSVRC/Data/VID/train/
directory, move all files fromILSVRC2015_VID_train_0000
,ILSVRC2015_VID_train_0001
,ILSVRC2015_VID_train_0002
,ILSVRC2015_VID_train_0003
to the parent directory. - Follow Preprocessing steps.
You can start training the model for your selected dataset (quickdraw, sketchy, or tu-berlin) by running the respective script:
bash train_{dataset}.sh
To evaluate the model, run the following command:
bash test.sh
For additional configuration options, refer to the lib/configs.py
file.
If you use SVOL in your research, please cite the following paper:
@article{woo2023sketch,
title={Sketch-based Video Object Localization},
author={Woo, Sangmin and Jeon, So-Yeong and Park, Jinyoung and Son, Minji and Lee, Sumin and Kim, Changick},
journal={arXiv preprint arXiv:2304.00450},
year={2023}
}
We appreciate much the nicely organized codes developed by DETR. Our codebase is built mostly based on them.