wohlert/generative-query-network-pytorch

Increase dimension of viewpoint and representation

Closed this issue · 4 comments

Thanks for this implementation.
One question I have is when increasing the dimension of viewpoint and representation, you use torch.repeat. Is there any reason for this? Can one possibly use interpolate?

In the original paper it says
"when concatenating viewpoint v to an image or feature map, its values are ‘broadcast’ in the spatial dimensions to obtain the correct size. "

The word 'broadcast' is not precisely defined, hence the question.

In order to broadcast, we require that the two tensors have similar dimensions. To not lose information, we increase the dimension of the viewpoint rather than reducing the dimension of the image.

One way of doing this is just to repeat the viewpoint until it "becomes an image". If we use torch.nn.functional.interpolate we arrive at the same result.

import torch
from torch.nn.functional import interpolate

v = torch.arange(5).float()
v = v.view(-1, 5, 1, 1)

v_repeat = v.repeat(1, 1, 64, 64)
v_interpolate = interpolate(v, (64, 64))

v_repeat.shape # => (1, 5, 64, 64)
torch.allclose(v_repeat, v_interpolate, 0) # => True

So in this case it does not matter which function we use.

After some thinking I realize that viewpoints are one dimensional so it makes sense to use repeat, but representation could be 2 dimensional so repeat is actual wrong in that case?

No, if you notice the following line, we only repeat if the pool architecture has been used. If pool has been used, then the representation is one-dimensional.

https://github.com/wohlert/generative-query-network-pytorch/blob/master/gqn/generator.py#L116

Ah thanks, that clears it.