buptLinfy/ZSE-SBIR

Batch, Sizes and Data pipeline

Opened this issue · 0 comments

Hello,
it might be a silly question, but after a while I could not figure out what is wrong with my reading.

(QUESTION 1)
In model.model.py it is commented that the batch goes from (b, C, H, W) ---> (2b, C, H, W) after concatenating image and sketches.

image

Later on, the batch increases up to 4b after self-attention (see image).

However, a quick unitary test reveals that the self attention module does not modify the batch:
image
Outputs:
torch.Size([3, 197, 768]) [196, ..., 196] [None, ..., None]

I suspect that I do not understand well how the positive / negative pairs are being passed to the model, and the scarce comments on the code can be a bit cryptic.

(QUESTION 2)
Therefore, my second question is, given the pair (sk, im) how are possitive and negatives defined?

I think it is not entirely clear after inspection of the triplet loss function:
image

(QUESTION 3)
I assume the following line is aggregating local information from adjacent tokens:
image
Is this commented on the paper? Can't read it in the Relational Network section rather than only mentioning the MLP-Relu concatenation.

Thanks for your attention, and keep it up with the good work!