oravus/seqNet

discussion on sequential descriptor

snakehaihai opened this issue · 2 comments

Hi May I ask what the sequential descriptor really encodes?

As far as I understand, I can use LSTM to estimate both self-motion as well as motion stereo depth.

Does LSTM here only encodes the changes in locomotion? or does it also encode the overall 3D structure prior?

If it encodes either motion or map, I can use short-duration odometry as additional field input or something like LIDAR projected depth map to speed up the seqNet process to make it multi-modality seqNet?

I would expect both motion and visual information to be encoded. During training, we assume odometry to be available (assumed in the form of GPS information) in this work, through which we subsampled images with a fixed metric separation. However, during testing, even without such sampling (or any odometry), we observed only slight performance variations to all the methods (see Fig 3b in the paper).

Our analyses of what is being learnt from within the image sequence was partly limited due to the direct use of descriptors instead of images.

We don't use LSTM but a 1D convolution (along the temporal axis). This work is similar to ours and uses LSTMs.

Including odometry or depth maps explicitly with image sequence to form a multi-modal SeqNet sounds great.

OK Thanks, We will try some new method and citr your work :)