JizhiziLi/GFM

about the image normalize

yhl1010 opened this issue · 5 comments

hello, thanks for your excellent work. but i found that you did not do image normalize in your inference code. does that means you did not do the same thing in training code? the input image is the origin rgb image?

Hi @yhl1010 , thanks for your interest. Yes, we did not adopt image normalization and the input is the original RGB image.

OK, i see, thank you very much!

99991 commented

@bluesky314 Are you sure that image normalization would make a difference here? Maybe it is just cargo cult from the times before batch norm.

I'd imagine that choosing the weights slightly differently would have the same effect as subtracting the mean and dividing by the standard deviation. Also, it is hard to justify over which dataset those means and standard deviations should be computed, especially when considering the trimap.

Either way, image normalization is another source of error and complexity, so it should not be used if it does nothing or only has a very marginal effect.

EDIT: It seems that a pretrained resnet34 is used here: https://github.com/JizhiziLi/animal-matting/blob/d967287439a4d69c85bf718ead52f10d4a35ef3d/core/network/e2e_resnet34_2b_gfm_tt.py#L72

Not sure if using a pretrained model helps since the new trimap layer will mess up the batch statistics anyway and alpha matting is not the same problem as image classification in the first place, but it would be interesting to see if the normalization from the docs makes a difference.

https://pytorch.org/docs/stable/torchvision/models.html

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225].

Hi @bluesky314 , just like what @99991 pointed out, we have batch norm in the network to perform normalization, and choosing the weight slightly can also achieve the same result.
The reason why we adopt the pre-trained model is to save the time of training and it has been confirmed that fine-tuning a pre-trained model on a dataset that has limited size can prevent overfitting effectively.