ziniuwan/maed

Some questions about training

zhLawliet opened this issue · 2 comments

@ziniuwan
1: you mentioned that “Use the last checkpoint of stage 1 to initialize the model and starts training stage 2.”
but there is "# We empirically choose not to load the pretrained decoder weights from stage1 as it yields better performance." in code, so will not use the pretrained weights of decode form stage1. howerver the MODEL.ENCODER.BACKBONE is "cnn" in stage1, wihch is "ste" in stage2, so we will also not use the pretrained weights of encode form stage1. so neither encode nor decode is used,why do we still need to do pre-training for stage1?

image

2: what does this operation mean?, i think it is similar to "x = x + self.pos_embed"
x = x.reshape(-1, seqlen, N, C) + self.temp_embed[:,:seqlen,:,:] in vision_transformer.py

@ziniuwan where can we find the supplmentary Material of the paper?
image

@ziniuwan i find the ROT_JITTER is 0, Is this profile the one you trained on? and the sample_freq is 8, why not it is 1?
image