NVlabs/ssn_superpixels

Some questions on source code

Bobholamovic opened this issue · 7 comments

Thanks for your code. Unfamiliar with Caffe and CUDA C/Cpp, I have two questions perplexing me when reading the code:

  1. In page 8 of the paper it is said that:

using row-normalized association matrix

Yet I've not found any row-normalization operation adopted where it ought to be (I suppose it t o be inside decode_features in create_net.py). Does this implementation make a simplification, or is the normalization done elsewhere?

  1. Again in decode_features in create_net.py, I notice that the neighboring superpixel features concat_spixel_feat are concatenated in a rather neat and tricky way, namely by passing through a group convolution. My doubt is that in my understanding, the convolution kernels should be fixed to specific values in order to achieve this. The kernels of each group should look like this if my guess were right:
[1, 0, 0                              [0, 1, 0                                 [0, 0, 0
 0, 0, 0                               0, 0, 0                                  0, 0, 0
 0, 0, 0]    for channel 1,            0, 0, 0]    for channel 2, ...,          0, 0, 1]    for channel 9

But nowhere in the repo can I find the initial value setting part of this convolution layer. I wonder where you put it or if it is that my guess is just incorrect?

We normalize the association matrix using softmax here:

pixel_spixel_assoc = L.Softmax(pixel_spixel_neg_dist)

We initialize the group convolution filter weights in this function:

def initialize_net_weight(net):

It makes sense. Thanks for your reply

Sorry for reopening this issue. I've been recently working on a Pytorch implementation of your paper and the work is roughly done. However, I found a few details that I'm not so sure about. Among them, the most confusing one would be the L.Softmax in create_net.py as we've discussed before. This was explained by normalizing the association matrix in your afore reply, but after a read-through, I believe the normalize function in Line 17 and the cuda lib of L.SpixelFeature2 also perform the normalization task. As far as I can understand from the paper, we should first get the original Q without any normalization for further tasks (e.g., mapping between pixels and spixels) and perhaps we need a exponential rather than a softmax.

Sorry for the lengthy description. Now my question is why a softmax is used instead of a exponential transformation, i.e., why an additional normalization is done here? Is it due to some numeric stability issues? Please correct me if I am wrong.

I am not sure if I understood your question completely. 'normalize' function is only used once at the end, where as 'softmax' is used at each iteration. There are actually two normalizations in each iteration. One is normalization across superpixels (using Softmax) and another one is normalization across pixels (in SpixelFeature2). Sorry that this is not very clear from the paper, where we did not explicitly mention the normalization over superpixels in some places. We talk about using 'row-normalized' and 'column-normalized' Q in sub-section "Mapping between pixel and superpixel representations".

It would be great if you can share your pytorch implementation to the community once it is ready. Thanks.

Many thanks. That is exatly what I was asking. Then I have one last question: why is there a normalization across superpixel in prior to the one across pixels at every iteration, since without the former one, I think, the mathematical meaning of Q would be probably clearer (in this case, Q would be an 'absolute' one rather than a 'relative' one, giving convenience to the next normalization across pixels).

I've just noticed that @CYang0515 has also implemented this work in Pytorch, what a coincidence. Anyway, I'll make my implementation public as soon as I get permission from my supervisor. Thanks again for the patient answer and thank you for your attention.

That makes sense. The network may also work without normalization across superpixels. Since there is an exponentiation involved, things might be more stable with softmax (normalization). I haven't tried without normalization. Let me know if you happen to try without normalization as well.

You are right. I've tried training the network both with and without softmax for a few epochs, and the one with softmax observed better stability when the input range varied. Yet I still need some further experiments to benchmark these two. My question is settled. Thank you!