Which dim to perform softmax
Closed this issue · 4 comments
SaoYan commented
Which dim to perform softmax
SaoYan commented
let N represents batch size, C,W,H represents the channels, height, width of the input tensor to nonlocal block, consider the subsample version.
in the following code:
f_div_C = F.softmax(f, dim=-1)
the size of f is (N,WH,WH/4)
my question is, why applying softmax to the last dimension?
why not applying like this:
f_div_C = F.softmax(f.view(N,-1), dim=1)
AlexHex7 commented
@SaoYan You can find the formulation in section 3.2, Embedded Gaussian of the paper.
for point i, it is the weighted sum all of points j. So for a weight matrix (WH,WH/4), each row in first dimension means point i, and each column in second dimension means point j.
SaoYan commented
Thanks a lot for the reply!