[Paper] [Short Video] [Presentation Video] [Project]
Authors: Ran Xu, Chen-lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, Saurabh Bagchi
Advanced video analytic systems, including scene classification and object detection, have seen widespread success in various domains such as smart cities and autonomous transportation. With an ever-growing number of powerful client devices, there is incentive to move these heavy video analytics workloads from the cloud to mobile devices to achieve low latency and real-time processing and to preserve user privacy. However, most video analytic systems are heavyweight and are trained offline with some pre-defined latency or accuracy requirements. This makes them unable to adapt at runtime in the face of three types of dynamism -- the input video characteristics change, the amount of compute resources available on the node changes due to co-located applications, and the user's latency-accuracy requirements change. In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios. To achieve this, we introduce a multi-branch object detection kernel (layered on Faster R-CNN), which incorporates a data-driven modeling approach on the performance metrics, and a latency SLA-driven scheduler to pick the best execution branch at runtime. We couple this kernel with approximable video object tracking algorithms to create an end-to-end video object detection system. We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3. We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines, e.g., it achieves 52% lower latency and 11.1% higher accuracy over YOLOv3.
If you have any questions or suggestions, feel free to email Ran Xu (xu943@purdue.edu).
If you find our work useful and relevant to your paper, please consider to cite,
@inproceedings{xu2020approxdet,
title={ApproxDet: Content and contention-aware approximate object detection for mobiles},
author={Xu, Ran and Zhang, Chen-lin and Wang, Pengcheng and Lee, Jayoung and Mitra, Subrata and Chaterji, Somali and Li, Yin and Bagchi, Saurabh},
booktitle={Proceedings of the 18th ACM Conference on Embedded Networked Sensor Systems (SenSys)},
pages={1--14},
year={2020}
}
- An NVIDIA Jetson TX2 board, for teaser experiments.
- A high-end server is needed for profiling experiments.
- Note: an equivalent embedded device (like NVIDIA Jetson Nano, Xavier, and Raspberry Pi) is acceptable but you will not be able to replicate our results quickly as you will need to do the time-consumping profiling and modeling experiments on your device.
We provide the installation guide on the NVIDIA Jetson TX2. The installation on the other device may vary.
- Install conda4aarch64: please follow this instruction.
- Install numba (Python package), please follow this instruction and install numba version 0.46.0.
- Install TensorFlow (Python package), please follow this instruction and install tensorflow-gpu version 1.14.0+nv19.7.
- Install opencv version 4.1.1 (Python package) (reference).
- Install other Python package,
pip install gdown opencv-python opencv-contrib-python
conda install pillow tqdm
- numpy version: make sure the numpy version is <= 1.16.4
- compile C code,
cd utils_approxdet
make
Due to the 100 MB file limit on the GitHub, we upload the model weights and video dataset (2.7 GB) on the Google Drive. You may use the following commands to download and extract the content.
gdown https://drive.google.com/uc?id=13z3qMObub1-GNv6HR1A56USdPBLcx6sq
gdown https://drive.google.com/uc?id=1wRv8PEYCfBC2bwZoVXLKk-kngiGKjZ6c
tar -xvf Data.tar
tar -xvf models.tar
The model weights and video datasets should be placed in the proper path so that they can be found. Generally, you need to specify "--dataset-prefix" argument for the video datasets. We highly recommand the following file path so that you can use the default commands,
/home/nvidia/ILSVRC2015/Data % the video dataset
/home/nvidia/ApproxDet_Replicate % src path
/home/nvidia/ApproxDet_Replicate/models % model path
In this experiment, we inject increasing GPU contention in the background from 0% to 60% with an increment of 20% every 200 frames. ApproxDet is able to adapt to this increasing resource contention. Here is how the replicate,
Run the contention generator program in the background,
screen
python3 ApproxDet_CG.py
Run ApproxDet,
python3 ApproxDet.py --input=test/VID_testimg_cs1.txt --preheat=1 \
--dataset_prefix=/home/nvidia/ILSVRC2015/ \
--weight=models/ApproxDet.pb \
--output=test/VID_testset_ApproxDet_cs1.txt
In this experiment, we increase the latency requirement from 80ms to 300ms per frame with an increment every 200 frames. ApproxDet is able to adapt to this changing latency requirement. Here is how the replicate,
Run the contention generator program in the background,
screen
python3 ApproxDet_CG.py
Run ApproxDet,
python3 ApproxDet.py --input=test/VID_testimg_cs2.txt --preheat=1 \
--dataset_prefix=/home/nvidia/ILSVRC2015/ \
--weight=models/ApproxDet.pb \
--output=test/VID_testset_ApproxDet_cs2.txt
The following experiments take much long time and may be harder to understand. Please read our paper throughly before doing the experiments.
The contention generator in the background is not needed in the following experiments, so remember to kill it by,
sudo pkill -f ApproxDet_CG
sudo pkill -f contention_module
python3 ApproxDetection.py --imagefiles=test/VID_testimg_00106000.txt \
--repeat=1 --preheat=1 \
--dataset_prefix=/home/nvidia/ILSVRC2015/ \
--shape=576 --nprop=100 \
--weight=models/ApproxDet.pb \
--output=test/VID_tmp.txt
python3 utils_approxdet/compute_mAP.py --gt=test/VID_testgt_full.txt \
--detection=test/VID_tmp_nprop100_shape576_det.txt \
--video=Data/VID/val/ILSVRC2015_val_00106000
Note: you may need to configure SWAP memory to run this experiment (reference).
python3 ApproxTracking.py --imagefiles=test/VID_testimg_00106000.txt \
--detection_file=test/VID_testset_nprop100_shape576_det.txt \
--dataset_prefix=/home/nvidia/ILSVRC2015/ \
--repeat=1 --si=8 --tracker_ds=medianflow_ds4 \
--output=test/VID_valset_nprop100_shape576.txt
python3 utils_approxdet/compute_mAP.py --gt=test/VID_testgt_full.txt \
--detection=test/VID_valset_nprop100_shape576_medianflow_ds4_det.txt \
--video=Data/VID/val/ILSVRC2015_val_00106000
python3 ApproxKernel.py --imagefiles=test/VID_testimg_00106000.txt \
--repeat=1 --preheat=1 \
--dataset_prefix=/home/nvidia/ILSVRC2015/ \
--shape=576 --nprop=100 --si=8 --tracker_ds=medianflow_ds4 \
--weight=models/ApproxDet.pb \
--output=test/VID_tmp.txt
python3 utils_approxdet/compute_mAP.py --gt=test/VID_testgt_full.txt \
--detection=test/VID_tmp_nprop100_shape576_medianflow_ds4_det.txt \
--video=Data/VID/val/ILSVRC2015_val_00106000
python3 ApproxDetctionProfiler.py --dataset_prefix=/home/nvidia/ILSVRC2015/ \
--weight=models/ApproxDet.pb
python3 ApproxTrackingProfiler.py --dataset_prefix=/home/nvidia/ILSVRC2015/
python3 ApproxDetection.py --imagefiles=test/VID_valimg_full.txt \
--repeat=1 \
--dataset_prefix=/data1/group/mlgroup/train_data/ILSVRC2015/ \
--shape=576 --nprop=100 \
--weight=models/ApproxDet.pb \
--output=test/VID_valset.txt
python3 ApproxAccProfilerMaster.py --port=5050 --logs=test/VID_tmp.log
python3 ApproxAccProfilerWorker.py --ip=localhost --port=5050 \
--dataset_prefix=/data1/group/mlgroup/train_data/ILSVRC2015/ --cpu=0
python3 ApproxDetctionProfilerSw.py --dataset_prefix=/home/nvidia/ILSVRC2015/ \
--weight=models/ApproxDet.pb --output=test/VID_switchingoverhead_run0.txt