Go-CaRD is an automatic recognition and detection system of automotive parts e.g. to study interactions between human and the vehicle. Below are some examples of random "in-the-wild" human-car videos. Note: Modell is not trained on consecutive frames (e.g. videos).
If you use these models, please cite the following paper:
@misc{stappen2021domain,
title={Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)},
author={Lukas Stappen and Xinchen Du and Vincent Karas and Stefan Müller and Björn W. Schuller},
year={2021},
eprint={2006.08521},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Results of CLOSE-CaR using the FULL head on the devel(opment) and test set reported in F1. Best model trained on [devel/test]:
- inside and outside classes: RESNET50 90.23 / 93.76
- outside classes: DENSENET201 97.13 / 93.08
- inside classes: MOBILENETV2 87.00 / 93.60
Results using a dataset for training (T1) with optional data injections in % of the second training set (T2); considering three levels of IoU fit for reporting mAP in % on the test set.
Darknet-backbone
T1 | Inj.% | T2 | dev/test | > 0.2 | > 0.4 | > 0.5 |
---|---|---|---|---|---|---|
mix | -- | -- | mix | 58.20 | 56.66 | 54.60 |
mix | 100 | part | mix | 63.01 | 61.22 | 59.10 |
mix | -- | -- | part | 17.39 | 15.89 | 14.46 |
mix | 100 | part | part | 41.07 | 38.60 | 35.56 |
TinyDarknet-backbone
T1 | Inj.% | T2 | dev/test | > 0.2 | > 0.4 | > 0.5 |
---|---|---|---|---|---|---|
mix | -- | -- | mix | 40.89 | 26.43 | 24.41 |
mix | 100 | part | part | 28.24 | 14.44 | 12.51 |
SqueezeNet-backbone
T1 | Inj.% | T2 | dev/test | > 0.2 | > 0.4 | > 0.5 |
---|---|---|---|---|---|---|
mix | -- | -- | mix | 46.03 | 44.14 | 42.29 |
mix | 100 | part | part | 22.99 | 21.00 | 19.07 |
A detailed version in the paper.
Pretrained models (download here)
The purpose of these models are to support research in the field of automatic recognition and detection of automotive parts in a natural context. It provides predictions for 29 interior and exterior vehicle regions during human-vehicle interaction. It also enables benchmarking and cross-corpus transfer learning, as demonstrated in GoCarD (A Generic, Optical Car Part Recognition and Detection).
MuSe-CaR-Part dataset (download here)
This dataset is a subset of 74 videos from the multimodal in-the-wild dataset MuSe-CAR. It contains 1 124 video frames showing human-vehicle interactions across all MuSe topics and 6 146 labels (bounding boxes). The pre-defined training, development and test partitions are also provided.
The purpose of this dataset is to support research in the field of automatic recognition and detection of automotive parts in a natural context. It provides labels for 29 interior and exterior vehicle regions during human-vehicle interaction. It also enables benchmarking and cross-corpus transfer learning, as demonstrated in GoCarD (A Generic, Optical Car Part Recognition and Detection). The footage captures many "in-the-wild" characteristics, including a range of shot sizes, camera motion, moving objects, a wide variety of backgrounds and different interactions.
The code in this repo is only exemplary and shows a) how images could be predicted from MuSe-CaR-Part using the model weights b) transform the prediction to a fixed sized feature set as used in [1], [2]. It is not intended to work out-of-the-box e.g. on videos. However, the weights can be used for such an application.
Any data, models, and weights derived from data contained in MuSe-CaR may only be used for scientific, non-commercial applications. The Academic Parties have the right to use these components for Academic and Research Purposes and Academic Publication is permitted. Commercial applications include, but are not limited to: Proving the efficiency of commercial systems, Testing commercial systems, Using screenshots of subjects from the database in advertisements, Selling data from the database. Please download and fill out the EULA - End User License Agreement before requesting the data or models. We will review your application and get in touch as soon as possible. Thank you.
[1] L Stappen , G Rizos, B Schuller. (2020). X-AWARE: ConteXt-AWARE human-environment attention fusion for driver gaze prediction in the wild. In Proceedings of the 22nd International Conference on Multimodal Interaction (ICMI). ACM.
[2] L Stappen, A Baird, G Rizos, P Tzirakis, X Du, F Hafner, L Schumann, A Mallol-Ragolta, BW Schuller, I Lefter, E Cambria. (2020). MuSe 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild. In Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop 2020 (MuSe). ACM.