Vegetebird/StridedTransformer-Pose3D

evaluation on those frames at the begging or the end of one video

Albertchen98 opened this issue · 3 comments

How did you deal with the situation when the target frame is the first one of the video? then there isn't any perceding frames to make the target frame as the 'center frame'. Or you just ignore it and start the evaluation from the 13th frame when the input sequence length is 27?

Following VideoPose3D and ST-GCN, in the data preprocessing, if the target frame is the first one of the video, we pad with the edge values of array. You can refer to

self.batch_2d = np.pad(seq_2d[low_2d:high_2d], ((pad_left_2d, pad_right_2d), (0, 0), (0, 0)), 'edge')
.

thank you very much for the quick reply~

I found another interesting thing that the max-pooling only extracts value each stride_num[i] steps because the kernel size is set to 1, there isn't any implict 'max-pooling' operation conducted. Have you ever tried kernel size 3 and got any performance discrepancy?

self.pooling = nn.MaxPool1d(1, stride_num[i])

I forget whether I have tried it, maybe you can try it~