evaluation on those frames at the begging or the end of one video

How did you deal with the situation when the target frame is the first one of the video? then there isn't any perceding frames to make the target frame as the 'center frame'. Or you just ignore it and start the evaluation from the 13th frame when the input sequence length is 27?

Following VideoPose3D and ST-GCN, in the data preprocessing, if the target frame is the first one of the video, we pad with the edge values of array. You can refer to

StridedTransformer-Pose3D/common/generator.py

Line 103 in 163f0cb

    
           self.batch_2d = np.pad(seq_2d[low_2d:high_2d], ((pad_left_2d, pad_right_2d), (0, 0), (0, 0)), 'edge')

.

thank you very much for the quick reply~

I found another interesting thing that the max-pooling only extracts value each stride_num[i] steps because the kernel size is set to 1, there isn't any implict 'max-pooling' operation conducted. Have you ever tried kernel size 3 and got any performance discrepancy?

StridedTransformer-Pose3D/model/block/strided_transformer_encoder.py

Line 65 in 163f0cb

self.pooling = nn.MaxPool1d(1, stride_num[i])

I forget whether I have tried it, maybe you can try it~