Using YOLO v4 and deep sort to track baboons in the wild.
- YOLO produces multi-object detections per each frame
- deepSORT stitches these detections across all frames to create trajectories
Given the number of human annotations is very less (~500), an augmentation module with crop-and-stitch augmentation is created.
an image patch of an object annotation is cropped and put back in a similarly looking neighbour hood patch. To search for such a patch, the border strips around the annotation is vectorized to a constant length and used to compare against all locations to find out best spot to stirch the patch over.
For each labeled object in the image, 5 best possible annotations (new locations and sizes to paste over) are calculated using non-maximal suppression on the distance map produced by comparing the vector with all possible locations in the image space.
The augmentation module resticts on certain augmentations with help of scoring functions. The score is gathered for each augmentation and only the top few scoring ones are allowed.
- a score to favor augmenting bigger objects (as its less meaningful to shuffle small baboons in the given environment)
- a score to penalize putting the baboons on the very top portions of image (in all the images, the ground horizon is in the middle and so its mostly sky in the upper half of image)
- a method to filter out pasting over other objects
Once the new pasting locations are finalized, the image is pasted over them and boundaries are smoothened using alpha-blending.
With multiple augmentations linked in chain: The 500 images served to create dataset with ~5k annoatated images.