uzh-rpg/RVT

A question about frame construction

Closed this issue · 4 comments

Hi @magehrig
Recently I noted that the framing pattern in your code has been just like this:
image
The channels are first divided by the polarity and then divided by the time interval.

However, in my impression, it seems the following method is more common:
image
In contrast, the channels are first divided by the time interval and then divided by the polarity.

Therefore, I train these two framing methods in RVT-B for GEN1, and I get the results below:

        if sort_mode == 'POL_SORT':
            indices = x.long() + \
                    wd * y.long() + \
                    ht * wd * t_idx.long() + \
                    bn * ht * wd * pol.long()
        elif sort_mode == 'TIME_SORT':
            indices = x.long() + \
                    wd * y.long() + \
                    ht * wd * pol.long() +\
                    ht * wd * ch * t_idx.long()

image

At first, I thought those two ways would get a similar performance. Surprisingly, your framing method achieves better performance!
I was wondering if you had tested these two framing methods before and used this framing scheme because the former achieved better results. Or do you know why the previous solution of sorting by polarity gives a better result?

Conceptually it should not make a difference because the first layer is a 2D conv so the order of the channels does not matter. However, if you made the changes mentioned above, you should probably also change the following line from

        representation = th.zeros((self.channels, self.bins, self.height, self.width),
                                  dtype=dtype, device=device, requires_grad=False)

to

        representation = th.zeros((self.bins, self.channels, self.height, self.width),
                                  dtype=dtype, device=device, requires_grad=False)

Hi @magehrig
Thank you for your reply.
The above results are so strange, at first I didn't change the following line:

representation = th.zeros((self.bins, self.channels, self.height, self.width),
                                  dtype=dtype, device=device, requires_grad=False)

Because I think there will be the same after the final reshape.

But after I found this result
image
I changed this line as you mentioned, However, I got the similar results in a sub-set gen1.
image

It's so strange, that I may need to repeat the experiments /(ㄒoㄒ)/~~.

great, let me know if that fixes your issue

OK, I will test it this time.