Yu Yao*, Mingze Xu*, Yuchen Wang, David Crandall and Ella Atkins
💥 May 19th 2020: Our new Detection of Traffic Anomaly (DoTA) dataset is available here! DoTA can be considered an extention of A3D, which provides more videos (4677 raw videos) and annotations (anomaly types, anomaly objects, and tracking ids).
DoTA also provides more benchmarks in driving videos, such as anomaly detection, action recognition, and online action detection. The corresponding paper can be found here.
This repo contains the A3D dataset and the code for our IROS2019 paper on unsupervised traffic accident detection.
This code also contains an improved PyTorch implementation for our ICRA paper Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems, which is an important building block for the traffic accident detection. The original project repo is https://github.com/MoonBlvd/fvl-ICRA2019.
To run the code on feature-ready HEV-I dataset or dataset prepared in HEV-I style:
cuda 9.0 or newer
pytorch 1.0
torchsummaryX
tensorboardX
Note that we apply a FOL and ego-motion prediction model for unsupervised anomaly detection. Thus, the model is only trained for FOL and ego-motion prediction on normal driving datasets. In the paper, we used HEV-I as our training data.
The training script and a config file template are provided. We first train the ego motion predictor and then train the FOL and ego motion predictor jointly:
python train_ego_pred.py --load_config config/fol_ego_train.yaml
python train_fol.py --load_config config/fol_ego_train.yaml
For evaluation purpose, we first run our fol_ego model on test dataset, e.g. A3D, to generate all predictions
python run_fol_for_AD.py --load_config config/test_A3D.yaml
This will save one .pkl
file for each video clip. Then, we can use the saved predictions to calculate anomaly detection metrics. The following command will print results similar to the paper.
python run_AD.py --load_config config/test_A3D.yaml
The only anomaly detection script is not provided, but the users are free to write another script to do FOL and anomaly detection online.
The A3D dataset contains videos from YouTube and a .pkl
file including human annotated video start/end times and anomaly start/end times. We provide scripts and url files to download the videos. Running the pre-process script will get the same frames we used in the paper.
Download the videos from YouTube:
python datasets/A3D_download.py --download_dir VIDEO_DIR --url_file datasets/A3D_urls.txt
Then convert the videos to frames in 10Hz:
python scripts/video2frames.py -v VIDEO_DIR -f 10 -o IMAGE_DIR -e jpg
Note that each downloaded video is a combination of several short clips, to split them into clips we used, run:
python datasets/A3D_split.py --root_dir DATA_ROOT --label_dir DIR_TO_PKL_LABEL
The annotations can be found in datasets/A3D_labels.pkl
Honda Egocentric View-Intersection (HEV-I) dataset is owned by HRI and the users can follow the link to request the dataset.
However, we provide the newly generated features here in case that you are interested in using the input features to test your models:
Training ego-motion extracted from ORBSLAM2
Validation ego-motion extracted from ORBSLAM2
Each feature file is name as "VideoName_ObjectID.pkl". Each .pkl file includes 4 attributes:.
- frame_id: the temporal location of the object in the video;
- bbox: the bounding box of the object from its appearing to disappearing;
- flow: the corresponding optical flow features of the object obtained from the ROIPool;
- ego_motion: the corresponding [yaw, x, z] value of ego car odometry obtained from the orbslam2.
To prepare the features used in this work, we used:
- Detection: MaskRCNN
- Tracking: DeepSort
- Dense optical flow: FlowNet2.0
- Ego motion: ORBSLAM2
To train the model, run:
python train_fol.py --load_config YOUR_CONFIG_FILE
To test the model, run:
python test_fol.py --load_config YOUR_CONFIG_FILE
An example of the config file can be found in config/fol_ego_train.yaml
We do not split the dataset into easy and challenge cases as we did in the original repo. Instead, we evalute all cases together. We are still updating the following results table by changing the prediction horizon and the ablation models.
Model | train seg length | pred horizon | FDE | ADE | FIOU |
---|---|---|---|---|---|
FOL + Ego pred | 1.6 sec | 0.5 sec | 11.0 | 6.7 | 0.85 |
FOL + Ego pred | 1.6 sec | 1.0 sec | 24.7 | 12.6 | 0.73 |
FOL + Ego pred | 1.6 sec | 1.5 sec | 44.1 | 20.4 | 0.61 |
FOL + Ego pred | 3.2 sec | 2.0 sec | N/A | N/A | N/A |
FOL + Ego pred | 3.2 sec | 2.5 sec | N/A | N/A | N/A |
Note: Due to the change of model structure, the above evaluation results can be different from the original paper. The users are encouraged to compare with the result listed in this repo since the new model structure is more efficient than the model proposed in the original paper.
If you found this repo is useful, please feel free to cite our papers:
@inproceedings{yao2018egocentric,
title={Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems},
author={Yao, Yu and Xu, Mingze and Choi, Chiho and Crandall, David J and Atkins, Ella M and Dariush, Behzad},
journal={IEEE International Conference on Robotics and Automation (ICRA)},
year={2019}
}
@inproceedings{yao2019unsupervised,
title={Unsupervised Traffic Accident Detection in First-Person Videos},
author={Yao, Yu and Xu, Mingze and Wang, Yuchen and Crandall, David J and Atkins, Ella M},
journal={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2019}
}