NFL 1st and Future - Impact Detection.

This is 29th place solution for Impact Detection competition hosted on Kaggle.

The goal of competition is to develop a computer vision model that automatically detects helmet impacts that occur on the field. Given data is RGB videos with labeled bounding boxes.

Here you can see sample video with predictions from the best model:

Approach

Several frames around central are taken to capture temporal context. Then frames go to the two-branch network with 3D motion module and 2D spatial module.

Last 3 feature maps from both models are fused using cfam module from YOWO work.
Then these feature maps are fused spatially by BiFPN module from EfficientDet.
Final representation of spatial location and motion goes to detection heads from EfficientDet D5.

Augmentations

There are two types of augmentations were used in this competition. For video I implemented different augmentations to keep spatial-temporal consistency.

Video-level

Horizontal Flip
Shift and Scale
MixUp
and something between cutmix and mosaic, but using only 2 videos for CPU efficiency.

Frame-level