Channel Dimension

Question

Channel Dimension

Closed this issue a year ago · 1 comments

In the paper, the orientation maps have dimensions |G| x C x H x W, but the maps from the code appear to have dimensions C x |G| x H x W, so just wanted to check that when the argmax, for example, is taken over dim=1, this is being done over the correct dimension. Also, the paper says that the number of channels C used was 2, but running the code seems to show that the dimensions of the orientation maps has C being 1. It's not particularly clear from the paper or the code what C is suppose to represent - is it the colour channels of an image? I'm guessing the group order |G| corresponds to the bin_size/B in the code? These things would be really helpful to understand. Many thanks.

Answer 1 · 2023-04-06T08:51:28.000Z

Why is argmax taken at dimension=1? First, the shape of the orientation histogram map is not (|G| x C x H x W), but (|G| x H x W) with channel dimension collapsed. The process of getting the orientation 'value' map (H x W) from the orientation 'histogram' map (|G| x H x W) is to take argmax as the G dimension. The reason for using dim=1 for argmax in the source code is that the batch dimension is at the 0th position.
why is the channel of the orientation map 1?: Because it is the orientation value map that takes argmax on the orientation histogram.
what does C mean in the paper and code? It is the number of channels a group action has.