This is the original implementation of the article
@article{rodriguez2020adaptive,
title={Adaptive Inattentional Framework for Video Object Detection With Reward-Conditional Training},
author={Rodriguez-Ramos, Alejandro and Rodriguez-Vazquez, Javier and Sampedro, Carlos and Campoy, Pascual},
journal={IEEE Access},
volume={8},
pages={124451--124466},
year={2020},
publisher={IEEE}
}
Recent object detection studies have been focused on video sequences, mostly due to the increasing demand of industrial applications. Although single-image architectures achieve remarkable results in terms of accuracy, they do not take advantage of particular properties of the video sequences and usually require high parallel computational resources, such as desktop GPUs. In this work, an inattentional framework is proposed, where the object context in video frames is dynamically reused in order to reduce the computation overhead. The context features corresponding to keyframes are fused into a synthetic feature map, which is further refined using temporal aggregation with ConvLSTMs. Furthermore, an inattentional policy has been learned to adaptively balance the accuracy and the amount of context reused. The inattentional policy has been learned under the reinforcement learning paradigm, and using our novel reward-conditional training scheme, which allows for policy training over a whole distribution of reward functions and enables the selection of a unique reward function at inference time. Our framework shows outstanding results on platforms with reduced parallelization capabilities, such as CPUs, achieving an average latency reduction up to 2.09x, and obtaining FPS rates similar to their equivalent GPU platform, at the cost of a 0.09 mAP reduction.
- Python 3.6+
- PyTorch 1.4.0
- Torchvision
- OpenCV
- Tensorflow 1.14.0
- Stable Baselines 2.10.0
- PyTorch Lightning 0.7.3
- Install Anaconda.
- Create and activate an environment.
- Install the system dependencies (follow the instructions for Pytorch Lightning and Stable Baselines)
- Clone this repo by
git clone https://github.com/alejodosr/adaptive-inattention
(a specific version of a package can be installed with pip install package=="0.0.0"
)
This framework has been validated with Imagent VID 2015 and a custom Multirotor Aerial Vehicles (MAV-VID) dataset:
- Imagenet VID 2015 dataset can be downloaded here.
- MAV-VID dataset can be downloaded here. The MAV-VID dataset has been annotated with YOLO annotations style.
Important: Once downloaded any dataset, use these scripts, in this order, scripts/get_list.py
and scripts/get_seq_list.py
(modify the path inside the script).
(If you want to use your custom dataset, YOLO-style annotations can be used and scripts/get_YOLO_shapes.py
has to be used to generate extra annotations)
Every script has several options and parameters. These are some important ones:
- For selecting MobileNetV2 backbone use
--backbone mobilenetv2
(if not included, by default MobileNetV1 is selected). - For MAV-VID dataset (or Yolo style dataset) use
--dataset_type yolo
(if not included, by default Imagenet VID 2015 dataset is selected). - For CPU evaluation use
--use_cuda False
(by default GPU is used if present) - Extended data augmentation is included for training (see article). To disable it use
--crop_prob 0.0
or, conversely, you can increase the probability0.x
of including a synthetic feature map in the batch with--crop_prob 0.x
Train the Fature Extractor:
CUDA_VISIBLE_DEVICES=0 python train_inattentional_model.py --net backbone --datasets /path/to/dataset/ --batch_size 16 --num_epochs 200 --width_mult 1 --cache_path /path/to/cache/folder --validation_epochs 3 --lr 0.0001 --scheduler plateau
or with MobileNetV2 backbone:
CUDA_VISIBLE_DEVICES=0 python train_inattentional_model.py --net backbone --datasets /path/to/dataset/ --batch_size 16 --num_epochs 200 --width_mult 1 --cache_path /path/to/cache/folder --validation_epochs 3 --lr 0.0001 --scheduler plateau --backbone mobilenetv2
Train the Temporal Aggregator (ConvLSTM):
CUDA_VISIBLE_DEVICES=0 python train_inattentional_model.py --net lstm --datasets /path/to/dataset/ --batch_size 50 --num_epochs 200 --width_mult 1 --cache_path /home/alejo/Downloads/cache --pretrained /path/to/checkpoint/checkpoint.ckpt --freeze_net --crop_prob 0.05
or with MobileNetV2 backbone:
CUDA_VISIBLE_DEVICES=0 python train_inattentional_model.py --net lstm --datasets /path/to/dataset/ --batch_size 50 --num_epochs 200 --width_mult 1 --cache_path /home/alejo/Downloads/cache --pretrained /path/to/checkpoint/checkpoint.ckpt --freeze_net --crop_prob 0.05 --backbone mobilenetv2
CUDA_VISIBLE_DEVICES=0 python train_rl_inattentional_policy.py --net lstm --dataset /path/to/dataset/ --trained_model /path/to/checkpoint/checkpoint.ckpt --label_file /path/to/labels/vid-model-labels.txt --tags_csv /path/to/tags/meta_tags.csv
Evaluate Trained Model (with Random Policy)
CUDA_VISIBLE_DEVICES=0 python evaluate_inattentional_policy.py --net lstm --policy baseline --dataset /path/to/dataset/ --trained_model /path/to/checkpoint/checkpoint.ckpt --label_file /path/to/labels/vid-model-labels.txt --tags_csv /path/to/tags/meta_tags.csv --iter 5 --prob 0.5 --use_cuda True
Evaluate Trained Model (with Inattentional Policy)
CUDA_VISIBLE_DEVICES=0 python evaluate_inattentional_policy.py --net lstm --policy rl_ppo2 --dataset /path/to/dataset/ --trained_model /path/to/checkpoint/checkpoint.ckpt --label_file /path/to/labels/vid-model-labels.txt --tags_csv /path/to/tags/meta_tags.csv --rl_path /path/to/rl/checkpoint --iter 5 --lambda_0 1.2 --use_cuda True
Evaluate only MobileNetV1/V2-SSDLite (only Feature extractor)
CUDA_VISIBLE_DEVICES=0 python evaluate.py --net backbone --dataset /path/to/dataset/ --trained_model /path/to/checkpoint/checkpoint.ckpt --label_file /path/to/labels/vid-model-labels.txt --tags_csv /path/to/tags/meta_tags.csv
This work was supported in part by the Spanish Ministry of Economy and Competitivity through the project (Complex Coordinated Inspection and Security Missions by UAVs in cooperation with UGV) under Grant RTI2018-100847-B-C21, in part by the MIT International Science and Technology Initiatives (MISTI)-Spain through the project (Drone Autonomy), and in part by the Mohamed Bin Zayed International Robotics Challenge (MBZIRC) in the year 2020 (MBZIRC 2020 competition).