Code issue

Hi, Cheng!

Thanks for your great work!

Our work is based on XMem framework and now wish to try to use your new vos_dataset.py, only to find a problem:
In the original XMem, transforms.Normalize is written directly inside vos_dataset.py, but in Cutie it is written in the model.
Is there any difference between the two? If they are the same, would it be better to write it back into vos_dataset.py to make it easier for people who use the XMem framework to migrate your new vos_dataset.py?

Best wish.

https://github.com/hkchengrex/XMem/blob/9ea04795564dcff06b6570132aed7eedba94d9b8/dataset/vos_dataset.py#L92-L95

Cutie/cutie/dataset/vos_dataset.py

Line 128 in e812cf3

transforms.ToTensor(),

Cutie/cutie/model/cutie.py

Line 58 in e812cf3

image = (image - self.pixel_mean) / self.pixel_std

Hello!
This is intentional. We have reimplemented a large portion of the code moving from XMem -> Cutie, with the intent of making the code easier to use and more modular. This is one of the changes that we make. We find that users (me included) often forget to normalize the input when implementing a new custom script using a different data loading logic. Moving the burden of data normalization inside the model makes the user's job easier (not having to obtain the precise mean/std and not having to import torchvision.transform etc.)

Got it, but i think it should be better to write a note in vos_dataset.py to remind users that the framework puts Normalize inside the model to prevent users whose work based on the XMem framework from forgetting it (like me hahaha 🙇)

Thank you for the input. Added a line in the datasets.