FastMOT
News
- (2021.2.13) Support Scaled-YOLOv4 models
- (2021.1.3) Add DIoU-NMS for YOLO (+1% MOTA)
- (2020.11.28) Docker container provided for Ubuntu
Description
FastMOT is a custom multiple object tracker that implements:
- YOLO detector
- SSD detector
- Deep SORT + OSNet ReID
- KLT tracker
- Camera motion compensation
Deep SORT requires running detection and feature extraction sequentially, which often becomes a bottleneck for real-time applications. FastMOT significantly speeds up the entire system to run in real-time even on Jetson. Motion compensation improves tracking for non-stationary camera where Deep SORT/FairMOT usually fail.
To achieve faster processing, FastMOT only runs the detector and feature extractor every N frames, while KLT fills in the gaps efficiently. FastMOT also re-identifies objects that moved out of frame and will keep the same IDs.
YOLOv4 was trained on CrowdHuman (82% mAP@0.5) while SSD's are pretrained COCO models from TensorFlow. Both detection and feature extraction use the TensorRT backend and perform asynchronous inference. In addition, most algorithms, including KLT, Kalman filter, and data association, are optimized using Numba.
Performance
Results on MOT20 train set
Detector Skip | MOTA | IDF1 | HOTA | MOTP | MT | ML |
---|---|---|---|---|---|---|
N = 1 | 66.8% | 56.4% | 45.0% | 79.3% | 912 | 274 |
N = 5 | 65.1% | 57.1% | 44.3% | 77.9% | 860 | 317 |
FPS on MOT17 sequences
Sequence | Density | FPS |
---|---|---|
MOT17-13 | 5 - 30 | 38 |
MOT17-04 | 30 - 50 | 22 |
MOT17-03 | 50 - 80 | 15 |
Performance is evaluated with YOLOv4 using TrackEval. Note that neither YOLOv4 nor OSNet was trained or finetuned on the MOT20 dataset, so train set results should generalize well. FPS results are obtained on Jetson Xavier NX.
FastMOT has MOTA scores close to state-of-the-art trackers from the MOT Challenge. Increasing N shows small impact on MOTA. Tracking speed can reach up to 38 FPS depending on the number of objects. Lighter models (e.g. YOLOv4-tiny) are recommended for a more constrained device like Jetson Nano. FPS is expected to be in the range of 50 - 150 on desktop CPU/GPU.
Requirements
- CUDA >= 10
- cuDNN >= 7
- TensorRT >= 7
- OpenCV >= 3.3
- PyCuda
- Numpy >= 1.15
- Scipy >= 1.5
- TensorFlow < 2.0 (for SSD support)
- Numba == 0.48
- cython-bbox
Install for x86 Ubuntu
Make sure to have nvidia-docker installed. The image requires an NVIDIA Driver version >= 450 for Ubuntu 18.04 and >= 465.19.01 for Ubuntu 20.04. Build and run the docker image:
# For Ubuntu 20.04, add --build-arg TRT_IMAGE_VERSION=21.05
docker build -t fastmot:latest .
# Run xhost + first if you have issues with display
docker run --gpus all --rm -it -v $(pwd):/usr/src/app/FastMOT -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY -e TZ=$(cat /etc/timezone) fastmot:latest
Install for Jetson Nano/TX2/Xavier NX/Xavier
Make sure to have JetPack 4.4+ installed and run the script:
./scripts/install_jetson.sh
Download models
This includes both pretrained OSNet, SSD, and my custom YOLOv4 ONNX model
./scripts/download_models.sh
Build YOLOv4 TensorRT plugin
cd fastmot/plugins
make
Download VOC dataset for INT8 calibration
Only required for SSD (not supported on Ubuntu 20.04)
./scripts/download_data.sh
Usage
- USB webcam:
python3 app.py --input_uri /dev/video0 --mot
- MIPI CSI camera:
python3 app.py --input_uri csi://0 --mot
- RTSP stream:
python3 app.py --input_uri rtsp://<user>:<password>@<ip>:<port>/<path> --mot
- HTTP stream:
python3 app.py --input_uri http://<user>:<password>@<ip>:<port>/<path> --mot
- Image sequence:
python3 app.py --input_uri img_%06d.jpg --mot
- Video file:
python3 app.py --input_uri video.mp4 --mot
- Use
--gui
to visualize and--output_uri
to save output - To disable the GStreamer backend, set
WITH_GSTREAMER = False
here - Note that the first run will be slow due to Numba compilation
More options can be configured in cfg/mot.json
- Set
resolution
andframe_rate
that corresponds to the source data or camera configuration (optional). They are required for image sequence, camera sources, and MOT Challenge evaluation. List all configurations for your USB/CSI camera:v4l2-ctl -d /dev/video0 --list-formats-ext
- To change detector, modify
detector_type
. This can be eitherYOLO
orSSD
- To change classes, set
class_ids
under the correct detector. Default class is1
, which corresponds to person - To swap model, modify
model
under a detector. For SSD, you can choose fromSSDInceptionV2
,SSDMobileNetV1
, orSSDMobileNetV2
- Note that with SSD, the detector splits a frame into tiles and processes them in batches for the best accuracy. Change
tiling_grid
to[2, 2]
,[2, 1]
, or[1, 1]
if a smaller batch size is preferred - If more accuracy is desired and processing power is not an issue, reduce
detector_frame_skip
. Similarly, increasedetector_frame_skip
to speed up tracking at the cost of accuracy. You may also want to changemax_age
such thatmax_age × detector_frame_skip ≈ 30
Track custom classes
FastMOT supports multi-class tracking and can be easily extended to custom classes (e.g. vehicle). You need to train both YOLO and a ReID model on your object classes. Check Darknet for training YOLO and fast-reid for training ReID. After training, convert the model to ONNX format and place it in fastmot/models. To convert YOLO to ONNX, use tensorrt_demos to be compatible with the TensorRT YOLO plugins.
Add custom YOLOv3/v4
- Subclass
YOLO
like here: https://github.com/GeekAlexis/FastMOT/blob/4e946b85381ad807d5456f2ad57d1274d0e72f3d/fastmot/models/yolo.py#L94Note that anchors may not follow the same order in the Darknet cfg file. You need to mask out the anchors for each yolo layer using the indices inENGINE_PATH: path to TensorRT engine (converted at runtime) MODEL_PATH: path to ONNX model NUM_CLASSES: total number of classes LETTERBOX: keep aspect ratio when resizing For YOLOv4-csp/YOLOv4x-mish, set to True NEW_COORDS: new_coords parameter for each yolo layer For YOLOv4-csp/YOLOv4x-mish, set to True INPUT_SHAPE: input size in the format "(channel, height, width)" LAYER_FACTORS: scale factors with respect to the input size for each yolo layer For YOLOv4/YOLOv4-csp/YOLOv4x-mish, set to [8, 16, 32] For YOLOv3, set to [32, 16, 8] For YOLOv4-tiny/YOLOv3-tiny, set to [32, 16] SCALES: scale_x_y parameter for each yolo layer For YOLOv4-csp/YOLOv4x-mish, set to [2.0, 2.0, 2.0] For YOLOv4, set to [1.2, 1.1, 1.05] For YOLOv4-tiny, set to [1.05, 1.05] For YOLOv3, set to [1., 1., 1.] For YOLOv3-tiny, set to [1., 1.] ANCHORS: anchors grouped by each yolo layer
mask
in Darknet cfg. Unlike YOLOv4, the anchors are usually in reverse for YOLOv3 and tiny - Change class labels here to your object classes
- Modify cfg/mot.json: set
model
inyolo_detector
to the added Python class and setclass_ids
you want to detect. You may want to play withconf_thresh
based on the accuracy of your model
Add custom ReID
- Subclass
ReID
like here: https://github.com/GeekAlexis/FastMOT/blob/aa707888e39d59540bb70799c7b97c58851662ee/fastmot/models/reid.py#L51ENGINE_PATH: path to TensorRT engine (converted at runtime) MODEL_PATH: path to ONNX model INPUT_SHAPE: input size in the format "(channel, height, width)" OUTPUT_LAYOUT: feature dimension output by the model (e.g. 512) METRIC: distance metric used to match features ('euclidean' or 'cosine')
- Modify cfg/mot.json: set
model
infeature_extractor
to the added Python class. You may want to play withmax_feat_cost
andmax_reid_cost
- float values from0
to2
, based on the accuracy of your model
Citation
If you find this repo useful in your project or research, please star and consider citing it:
@software{yukai_yang_2020_4294717,
author = {Yukai Yang},
title = {{FastMOT: High-Performance Multiple Object Tracking Based on Deep SORT and KLT}},
month = nov,
year = 2020,
publisher = {Zenodo},
version = {v1.0.0},
doi = {10.5281/zenodo.4294717},
url = {https://doi.org/10.5281/zenodo.4294717}
}