- we choose yolov5 as an object detector instead of Faster R-CNN, it is faster and more convenient
- we use a tracker(deepsort) to allocate action labels to all objects(with same ids) in different frames
- our processing speed reached 24.2 FPS at 30 inference batch size (on a single RTX 2080Ti GPU)
Relevant infomation: FAIR/PytorchVideo; Ultralytics/Yolov5
- 2022.01.24 optimize pre-process method(no need to extract video to image before processing), faster and cleaner.
clone this repo:
git clone https://github.com/wangkang12/yolo_slowfast-master cd yolo_slowfast
create a new python environment (optional):
conda create -n {your_env_name} python=3.7.11 conda activate {your_env_name}
install requiments:
pip install -r requirements.txt
download weights file(ckpt.t7) from [deepsort] to this folder:
test on your video:
python yolo_slowfast.py --input {path to your video}
The first time execute this command may take some times to download the yolov5 code and it's weights file from torch.hub, keep your network connection.
test on your webcam run demo_inference_camerweb.py the demo can support the display on the Flask web.but the realtime is poor,next I will optimize the demo .
Thanks for these great works:
[2] ZQPei/deepsort
[4] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. paper
[5] SlowFast Networks for Video Recognition. paper
If you find our work useful, please cite as follow:
{ yolo_slowfast,
author = {Wu Fan},
title = { A realtime action detection frame work based on PytorchVideo},
year = {2021},
url = {\url{https://github.com/wufan-tb/yolo_slowfast}}