Batch, Sizes and Data pipeline
Opened this issue · 0 comments
Hello,
it might be a silly question, but after a while I could not figure out what is wrong with my reading.
(QUESTION 1)
In model.model.py
it is commented that the batch goes from (b, C, H, W) ---> (2b, C, H, W) after concatenating image and sketches.
Later on, the batch increases up to 4b after self-attention (see image).
However, a quick unitary test reveals that the self attention module does not modify the batch:
Outputs:
torch.Size([3, 197, 768]) [196, ..., 196] [None, ..., None]
I suspect that I do not understand well how the positive / negative pairs are being passed to the model, and the scarce comments on the code can be a bit cryptic.
(QUESTION 2)
Therefore, my second question is, given the pair (sk, im)
how are possitive and negatives defined?
I think it is not entirely clear after inspection of the triplet loss function:
(QUESTION 3)
I assume the following line is aggregating local information from adjacent tokens:
Is this commented on the paper? Can't read it in the Relational Network section rather than only mentioning the MLP-Relu concatenation.
Thanks for your attention, and keep it up with the good work!