- we choose yolov5 as an object detector instead of Faster R-CNN, it is faster and more convenient
- we use a tracker(deepsort) to allocate action labels to all objects(with same ids) in different frames
- our processing speed reached 24.2 FPS at 30 inference batch size (on a single RTX 2080Ti GPU)
Relevant infomation: FAIR/PytorchVideo; Ultralytics/Yolov5
- 2022.01.24 optimize pre-process method(no need to extract video to image before processing), faster and cleaner.
-
clone this repo:
git clone https://github.com/wangkang12/yolo_slowfast-master cd yolo_slowfast
-
create a new python environment (optional):
conda create -n {your_env_name} python=3.7.11 conda activate {your_env_name}
-
install requiments:
pip install -r requirements.txt
-
download weights file(ckpt.t7) from [deepsort] to this folder:
./deep_sort/deep_sort/deep/checkpoint/
-
test on your video:
python yolo_slowfast.py --input {path to your video}
The first time execute this command may take some times to download the yolov5 code and it's weights file from torch.hub, keep your network connection.
-
test on your webcam run demo_inference_camerweb.py the demo can support the display on the Flask web.but the realtime is poor,next I will optimize the demo .
Thanks for these great works:
[2] ZQPei/deepsort
[4] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. paper
[5] SlowFast Networks for Video Recognition. paper
If you find our work useful, please cite as follow:
{ yolo_slowfast,
author = {Wu Fan},
title = { A realtime action detection frame work based on PytorchVideo},
year = {2021},
url = {\url{https://github.com/wufan-tb/yolo_slowfast}}
}