YouTube-GDD: A challenging gun detection dataset with rich contextual information [arXiv] [Project Page]

Overview

To promote the development of security, this work presents a new challenging dataset called YouTube Gun Detection Dataset (YouTube-GDD). Our dataset is collected from 343 high-definition YouTube videos and contains 5000 well-chosen images, in which 16064 instances of gun and 9046 instances of person are annotated. Compared to other datasets, YouTube-GDD is "dynamic", containing rich contextual information and recording shape changes of the gun during shooting. To build a baseline for gun detection, we evaluate YOLOv5 on YouTube-GDD and analyze the influence of additional related annotated information on gun detection.

Dates

Release Training and validation sets. [2022-04]
Release test images. [2022-04]
Open the evaluation server to the public. [to be confirmed]
Augment dataset volumn to the level of ten thousand. [to be confirmed]

Description

All images are captured from YouTube videos.
All annotations are labeled in YOLO format with labelImg.
YouTube-GDD contains two categories, namely "person" and "gun", corresponding to category ids 0 and 1, respectively.
The name format of each image file and the corresponding label file is set as "YouTube id_original frame rate_split frame rate_ID".

Statistics

Firstly, we split the entire dataset into 10 nonoverlapping folds by filename, each containing 500 images. Secondly, we compute the ratio of different scales in the entire dataset as the probability distribution, and then compute the scale distribution of each fold. The two folds with the lowest JS divergence are chosen as test set and validation set, i.e., fold7 is chosen as the test set and fold6 is chosen as the validation set while the rest folds are adopted as the training set.

Split	Images	Videos	Category		Scale
Split	Images	Videos	person	gun	small	medium	large
fold1	500	35	467	1265	373	235	1124
fold2	500	34	430	620	4	84	962
fold3	500	31	466	905	39	259	1073
fold4	500	31	427	751	5	124	1049
fold5	500	36	471	716	11	120	1056
fold6	500	43	415	718	13	122	998
fold7	500	42	394	879	67	193	1013
fold8	500	34	475	636	1	60	1050
fold9	500	33	460	589	3	57	989
fold10	500	32	518	953	37	281	1151
all	5000	343	9046	16064	1106	3070	20934

Table Note: Frames captured from the same video may be assigned into two adjacent folds, causing the video to be repeatedly counted.

Construct YouTube-GDD from Source Videos

[Update 18th April] We thank a2515919 who is also working on the dataset and willing to share the pre-processed images: Google Drive Link.

Here, three scripts are provided for constructing YouTube-GDD from source videos step by step.

Download videos.

cd /path/to/YouTube-GDD/
python ./tools/download.py --videolist ./configs/videolist.txt  --videopath /path/to/videos

Extract frames.

cd /path/to/YouTube-GDD/
python ./tools/extract.py  --videopath /path/to/videos  --framepath /path/to/frames

Select images.

cd /path/to/YouTube-GDD/
python ./tools/select.py  --imagelist  ./configs/imagelist.npy  --framepath /path/to/frames --imagepath /path/to/images

After collecting images, unzip labels.zip to the parent root of imagepath and the expected dataset structure should be organized as follows, which also meets the dataset structure requirement of YOLOv5.

YouTube-GDD/
  images/
      train/
      val/
      test/
  labels/
      train/
      val/

Baseline

Method	w/ TL	w/ AoP	FLOPs	Params	Gun		Person
Method	w/ TL	w/ AoP	FLOPs	Params	AP₅₀	AP	AP₅₀	AP
YOLOv5s			15.80G	7.01M	67.7	41.0	-	-
		yes	15.81G	7.02M	67.9	41.3	90.3	75.0
	yes		15.80G	7.01M	75.0	52.0	-	-
	yes	yes	15.81G	7.02M	77.3	52.1	92.4	81.2

Table Note: TL means Transfer Learning and AoP means Annotations of Person.

Contact

If you have any general question, feel free to email us at guyongxiang19@mails.ucas.ac.cn. If you have dataset-related or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).

Citation

If you find our work inspiring or use our dataset in your research, please cite our work.

@article{gu2022youtube-gdd,
  title={YouTube-GDD: A challenging gun detection dataset with rich contextual information},
  author={Gu Yongxiang and Liao Xingbin and Qin Xiaolin},
  journal={arXiv preprint arXiv:2203.04129},
  year={2022}
}

Thanks

We thank Lab students, namely Mingfei Li, Jingyang Shan, Qianlei Wang, Siqi Zhang, Xu Liao, Yuncong Peng, Gang Luo, Xin Lan, Boyi Fu and Yangge Qian, for their suggestions about improving the YouTube-GDD dataset.

UCAS-GYX/YouTube-GDD