davyneven/SpatialEmbeddings

Regarding variable `xym`

Closed this issue ยท 1 comments

Hello,

Thank you for an excellent implementation and publication. I am learning quite a lot looking at your code and how you packaged this project. ๐Ÿ‘

One question, which I wanted to run by you is, regarding the line of code for creating the state dict xym. I suspect that the cat order should be reversed since later this variable is accessed as [channel, height, width]. So what I suggest is:

# coordinate map
xm = torch.linspace(0, 2, 2048).view(1, 1, -1).expand(1, 1024, 2048)
ym = torch.linspace(0, 1, 1024).view(1, -1, 1).expand(1, 1024, 2048)
xym = torch.cat((xm, ym), 0)

should become

# coordinate map
xm = torch.linspace(0, 2, 2048).view(1, 1, -1).expand(1, 1024, 2048)
ym = torch.linspace(0, 1, 1024).view(1, -1, 1).expand(1, 1024, 2048)
xym = torch.cat((ym, xm), 0)

and this would lead to equivalent changes in this line of code as well. I might be interpreting this completely wrongly, but just wanted to check with you. Thank you for your time!

Hi,

@MLbyML I don't think that there is an error in the authors implementation, since he just takes care of images with different sizes and adapts the coordinate map accordingly. So xym_s = self.xym[:, 0:height, 0:width].contiguous() is just regular slicing and assumes the x-coordinates to be in the first channel and the y-coordinates to be in the second channel.