Yolov5+SlowFast: Realtime Action Detection

A realtime action detection frame work based on PytorchVideo.

Here are some details about our modification:

we choose yolov5 as an object detector instead of Faster R-CNN, it is faster and more convenient
we use a tracker(deepsort) to allocate action labels to all objects(with same ids) in different frames
our processing speed reached 24.2 FPS at 30 inference batch size (on a single RTX 2080Ti GPU)

Relevant infomation: FAIR/PytorchVideo; Ultralytics/Yolov5

Demo comparison between original(<-left) and ours(->right).

Update Log:

2022.01.24 optimize pre-process method(no need to extract video to image before processing), faster and cleaner.

Installation

clone this repo:

git clone https://github.com/wangkang12/yolo_slowfast-master
cd yolo_slowfast

create a new python environment (optional):

conda create -n {your_env_name} python=3.7.11
conda activate {your_env_name}

install requiments:
```
pip install -r requirements.txt
```
download weights file(ckpt.t7) from [deepsort] to this folder:
```
./deep_sort/deep_sort/deep/checkpoint/
```
test on your video:
```
python yolo_slowfast.py --input {path to your video}
```
The first time execute this command may take some times to download the yolov5 code and it's weights file from torch.hub, keep your network connection.
test on your webcam run demo_inference_camerweb.py the demo can support the display on the Flask web.but the realtime is poor,next I will optimize the demo .

References

Thanks for these great works:

[1] Ultralytics/Yolov5

[2] ZQPei/deepsort

[3] FAIR/PytorchVideo

[4] AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. paper

[5] SlowFast Networks for Video Recognition. paper

Citation

If you find our work useful, please cite as follow:

{   yolo_slowfast,
    author = {Wu Fan},
    title = { A realtime action detection frame work based on PytorchVideo},
    year = {2021},
    url = {\url{https://github.com/wufan-tb/yolo_slowfast}}
}

wangkang12/yolo_slowfast-master