jefflai108/Contrastive-Predictive-Coding-PyTorch

Feed entire input to encoder??

NeteeraAF opened this issue · 1 comments

I see in your implementation that you feed entire signal into the encoder,
while the paper has noted that each timestemp should be insert seperatly.
When you feed the entire signal into the encoder, you get some overlapping features with the Conv kernel (except for the case that the stride equal to the kernel size).

Why did you implement like that? do you think it does not matter ?

Thanks!

I have this doubt as well. I notice that in the paper the training inputs are segmented into small chunks with each chunk feeding into the encoder to the the feature representation z_t which will then feed into g_ar(GRU). The context C_t from the g_ar are then used to predict feature representation z_{t+k} from the future time frame. I don't know if I have the correct understanding of the paper or not.

In this implementation, I think the entire signal are fed into the encoder, the produced feature representation are split into two, the first part fed to the g_ar (GRU), then the g_ar learns to predict the second part of the representation features.

I believe that these two are different models and concepts which would bring different results. I really hope that the author could elaborate on point.

Thanks!