davyneven/SpatialEmbeddings

Multi class settings

Opened this issue · 22 comments

Hi, neven. Could you update this repo to multi class settings as decribed in your paper? I'm trying to reproduce the results on Cityscapes. Thanks.

@davyneven I'm reproducing your multi class results recently, but the current results are not good. I think there may be something different between my implementation and yours. May I ask these several questions:

  • Are your seed maps' channels 8 or 9 (8+bkg)?
  • How did you calculate loss when it is multi class? I just saw your multi-class lovasz_softmax loss, the input channel is C, but I think the input embedding should always be 2 channels no matter it is one class or multi-class? In my implementation, I still use the lovasz_hinge loss rather than lovasz_softmax loss.
  • How did you cluster when it is multi class? I did sequential clustering on each seed map one by one...
    Thank you very much!!!

Hi,

  • Are your seed maps' channels 8 or 9 (8+bkg)?
    I use a seed map per class, so for Cityscapes this would mean 8 seed maps. Each seed map is trained as a one-vs-all map, so no softmax.
  • How did you calculate loss when it is multi class?
    I use the same loss for the multi-class setting, but add an extra for loop to loop over the 8 classes. This time, the foreground weight for the seedmap is different for each class, and is calculated as described in the ERFNet paper
  • How did you cluster when it is multi class?
    I indeed do a sequential clustering on each seed map.

@davyneven Thank you so much for your quick reply!! My seed map is also 8 channels, and I also used an extra for loop over the 8 classes, but I use the same foreground weight, this may be a problem. Also, I have a question about clustering:

  • How did you choose the seed map sequence to cluster? eg. First car, then person, train, motorcycle, bicycle...
    I think the order matters because there may be some cases like, the object is a bicycle, but in the motorcycle seed map, it happens to have some prediction values, and the max value is larger than the threshold 0.9 (but maybe smaller than that on the bicycle). Due to the order, it will first cluster to a motorcycle instance rather than bicycle.

The foreground weights are very import to reach good results, so you should definitely see some improvements. The order of clustering the seed maps of the different classes doesn't matter. I cluster each seed map separately, and save each clustered object. So indeed, this way you can have a same object detected in two seed maps, eg a bicycle can be detected as a bicycle and also as a motorcycle.

@davyneven Thanks for your reply! I use your clustering method and the performance imporved! However, I double-checked the ERFNet paper and found nothing about class weights, their code is also not available now. Can you share me your code or paste part of them on this issue?:) Thanks!

Hi, I have a question about the for loop over seed map in calculating the seed loss. Since an instance can only belong to a single class, so why do we need this for loop? In my implementation, I simply do it this way: seed_loss = seed_map[c][mask] - dist[mask], where c is the instance class. But when it comes to a for loop, the mask region of other channels in seed map are useless and should be zero, right? If so, seed_loss = seed_map[c][mask] - 0 ? Any idea? @davyneven @charlotte12l

I would like to do some research about instance segmentation based on your work, so I'm reproducing your multi class results, and I just can not reach the same performance as yours, my configuration:

  1. 8 seed maps
  2. sequence cluster for each class
  3. same weights as yours

Training:

  1. pretrain 200 epochs, adam, 371 iter/epoch, batch_size = 8, crop_size = 512, lr = 0.0005, object center transform
  2. freeze bn for another 50 epochs, lr = 0.00005. crop_size = 936, bath_size = 2, random crop and object center transform reach the same performance.

My questions:

  1. I can only reach 26 AP on val. Especially, my model reach really low AP with Train, motocycle and cycle. (0.12, 0.12, 0.2), could you give me some advice? What's more, I do not generate the crop, I just use the CropRandomObject transfrom. Is this wrong?
  2. what's your total iteration? Maybe mine is not enough, which is about 80k + 100k.
  3. And I wonder what the performance in validation ? 33 AP?
  4. Do you use sync_bathnorm?
  5. Should I set foreground weight for each class? How should I set the loss weight properly?

Your reply will be very important and helpful for me. Thanks a lot.
@davyneven

Hi, can you share your code? @GitHberChen

Hi, can you share your code? @GitHberChen

Sorry, I can not, maybe later.

Hi, can you share your code? @GitHberChen

Sorry, I can not, maybe later.

How is your performance in the first training stage?

Hi, can you share your code? @GitHberChen

Sorry, I can not, maybe later.

How is your performance in the first training stage?

about 21 AP on val, what about yours?

Hi, can you share your code? @GitHberChen

Sorry, I can not, maybe later.

How is your performance in the first training stage?

about 21 AP on val, what about yours?

11:( There must be something wrong with my code. Can you tell me how you calculate the seed loss? I think I may have problem here. And by the way, do you test on 2048*1024?

Hi, can you share your code? @GitHberChen

Sorry, I can not, maybe later.

How is your performance in the first training stage?

about 21 AP on val, what about yours?

11:( There must be something wrong with my code. Can you tell me how you calculate the seed loss? I think I may have problem here. And by the way, do you test on 2048*1024?

yes, of course

Hi, can you share your code? @GitHberChen

Sorry, I can not, maybe later.

How is your performance in the first training stage?

about 21 AP on val, what about yours?

11:( There must be something wrong with my code. Can you tell me how you calculate the seed loss? I think I may have problem here. And by the way, do you test on 2048*1024?

yes, of course

Can you show me your seed loss?Your 26 AP is very close to the 27.6 claimed in paper and I think your implementation is right.

@GitHberChen Hi, the author said the foreground weight should be set as ERFNet paper said.. But the paper doesn't explain how they calculate the weight, I think their ERFNet code do but the code is now invisible... If you know how to calculate the weight, please contact me, thanks! :)

@GitHberChen Hi, the author said the foreground weight should be set as ERFNet paper said.. But the paper doesn't explain how they calculate the weight, I think their ERFNet code do but the code is now invisible... If you know how to calculate the weight, please contact me, thanks! :)

The foreground weight is really important, and can impact the result significantly, but I can not get what author said, ERFNet paper does not mention it at all. Can you explain how to calculate or set up the foreground weight for each class? I think class like train, truck should set much higher weight than car @davyneven

There are actually 2 versions of the ERFNet paper, the weight calculation is mentioned in this one: http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf (similar to the ENet weight calculation, but with a different factor in the log)

Aside from the different foreground weights, another important issue is the data balancing during training. Especially when training on the crop variant, the rare classes will be drastically outnumbered. Therefore my strategy is to first construct 8 datasets, all with size set to 3000/8 (In the dataloader class, samples are sampled randomly so you can mimic a larger/smaller dataset). Next, concat them together (there is a method for this in torchvision) and use this combined, and now also balanced, dataset to construct the dataloader iterator. This should increase the results on the rare classes.

After training this way (for me, 200 epochs were also sufficient, but you can train for a longer amount) you still have to finetune on the train set, again with the weights following ERFNet but this time you should not balance anymore, just use the standard train set. This will make sure there are fewer false positives on the rare classes (which is also a delicate balance and often results in low performance).

Hope this will lead to better results :)

There are actually 2 versions of the ERFNet paper, the weight calculation is mentioned in this one: http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf (similar to the ENet weight calculation, but with a different factor in the log)

Aside from the different foreground weights, another important issue is the data balancing during training. Especially when training on the crop variant, the rare classes will be drastically outnumbered. Therefore my strategy is to first construct 8 datasets, all with size set to 3000/8 (In the dataloader class, samples are sampled randomly so you can mimic a larger/smaller dataset). Next, concat them together (there is a method for this in torchvision) and use this combined, and now also balanced, dataset to construct the dataloader iterator. This should increase the results on the rare classes.

After training this way (for me, 200 epochs were also sufficient, but you can train for a longer amount) you still have to finetune on the train set, again with the weights following ERFNet but this time you should not balance anymore, just use the standard train set. This will make sure their are fewer false positives on the rare classes (which is also a delicate balance and often results in low performance).

Hope this will lead to better results :)

Thanks you so much! Your advices enlighten me a lot, thanks again!

@GitHberChen Hi, the author said the foreground weight should be set as ERFNet paper said.. But the paper doesn't explain how they calculate the weight, I think their ERFNet code do but the code is now invisible... If you know how to calculate the weight, please contact me, thanks! :)

http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf

weight_C = 1/log(1.1+p_C) , p_C = the prob of class_C
Am I right? @davyneven

@davyneven Hi, I think Q&A is very inefficient, could you please release the multi-class code? Thank you so much! :)

I cannot understand the meaning of
“ my strategy is to first construct 8 datasets, all with size set to 3000/8“.
If it means that
“each dataset has 3000/8=375 images of each class(person,...,bicycle), and then concat them to 1 dataset.”,
how do I create it? Is there any script for it, or do it by myself?

I apologize for my poor English.

hey @GitHberChen, Is it possible for you to share your code for multiclass, it would be a great help.