AlexHex7/Non-local_pytorch

Which dim to perform softmax

Closed this issue · 4 comments

Which dim to perform softmax

let N represents batch size, C,W,H represents the channels, height, width of the input tensor to nonlocal block, consider the subsample version.

in the following code:
f_div_C = F.softmax(f, dim=-1)

the size of f is (N,WH,WH/4)

my question is, why applying softmax to the last dimension?

why not applying like this:
f_div_C = F.softmax(f.view(N,-1), dim=1)

@SaoYan You can find the formulation in section 3.2, Embedded Gaussian of the paper.
for point i, it is the weighted sum all of points j. So for a weight matrix (WH,WH/4), each row in first dimension means point i, and each column in second dimension means point j.

Thanks a lot for the reply!