isl-org/PhotorealismEnhancement

Custom Robust Label Map

mvshaag opened this issue · 1 comments

Hi!

Which dimensions does the perceptual discriminator expect for the robust label map?
Three channel colored segmentation or a single channel ID mapping?

While preparing to train on my custom dataset, I stumbled upon the requirements for the robust label map.  To my understanding, the labels generated by MSeg can be replaced by any arbitrary method of labeling the images. Since I'm training on CityScapes as the target dataset, I thought about replacing the MSeg labels with the City Scapes labels, since they are also available on my synthetic dataset.

I was able to train the model on the three channel segmentation map, but the network seems to generate weird artifacts from time to time (See attached images). I assume they are caused by some error in the robust label map.  Could this be an error on my side, or did you experience similar artifacts during training?

Thank you in advance for any help & insights!

Crops from validation results at step 470000.

crop_470000_6
crop_470000_1

Hi,

the robust label map is used here: ProjectionDiscriminator. It is expected to be either a single channel int64 with a maximum of 194 classes or a one-hot-encoded float tensor.

Even though you have compatible labels, I'd recommend MSeg as it allows you to train on all Cityscapes images, not just the subset that comes with fine-grained annotations.

The artifacts are not necessarili to worry about. Training with an adversarial loss can be unstable and we have seen these as well. They should disappear within some tens of thousand iterations again. If they keep appearing, I'd try to (independently): increase the dataset size, increase the VGG loss, train longer.