Training with mapillary vistas dataset

Question

Training with mapillary vistas dataset

nithinme3 opened this issue 3 years ago · 2 comments

Hi,
we are trying to train deeplab2 with only semantic seg using mapillary dataset. We have successfully created the tf records and started the training. but training gets killed in between due to RAM getting used up completely.
Also when we created TF records the size of tfrecords were too large. for example we have taken 5000 images for training and 500 for val and 1000 for test images. Altogether tf records size is around 200GB where as the size of files including train + val+ test is around 7GB only. Is this expected? (The size of mapillary images are of the range of 3000+ x 2000+ (width x height).). ( we have used provided codes only for generating panoptic images and tfrecords). I noticed that we are reading panoptic map image as np.int32 array in build_step_data.py. The size of the array increases drastically to 32MB after reading. (while actual panoptic image is less than 1MB).
We have 32 GB of RAM and 32 GB of NVIDIA V100 GPU.
also we have set,
crop size to 513 x513,
batch size to 1,
augmentation max scale factor we changed from 2 to 1.

Only after above changes we are able to start training with mapillary dataset. (before that it was not even starting training.). But after around 4000 steps its getting killed.

Is there anything we can do to complete the training?
Since we are setting the crop size to 513 will it affect the performance of the trained model? as the normal image and annotated image sizes are 4000 x 3000(width x height) ranges for mapillary.
Thank you,
Nithin

Answer 1 · 2022-02-18T17:23:51.000Z

Hi @nithinme3,

Thanks for reporting the issue.

Indeed, TFRecord is not memory-friendly since it does not compress any arrays, and we default to use np.int32 to support large panoptic_label_divisor for some panoptic segmentation datasets.

Maybe we could try the following things:

Hack the code to store panoptic labels as uint8 or int16, since you have only semantic labels.
Resize the Mapillary Vistas images before converting them to TFRecord. As mentioned in the Panoptic-DeepLab paper, we could resize Mapillary Vistas images on the fly during training so that their longest side is no more than 2177 (you can try a even smaller value, e.g., 2048 (same as Cityscapes) or even smaller size). Now, we can do the resizing offline in order to save some RAM.
Use larger _NUM_SHARDS (say 1000), which allows the code to load a smaller size of TFRecord during training.

Regarding the training settings, it is better to use larger crop size and large max scale factor (for data augmentation), to get the better performance.

Cheers,

Answer 2 · 2022-03-24T16:40:43.000Z

Hi @nithinme3,

It has been a while, and we hope you have figured the issue.
Closing the issue now due to lack of activity.
But, please feel free to reopen it if you have encounter other issues.

Cheers,