Does mini batch order matter?
matthewkperez opened this issue · 4 comments
Hello Pytorch-kaldi team,
I currently trying to study the effect of specialty networks (i.e. layers of the network only trained on specific data) on senone classification. To do this I want to implement a multitask network that has a shared upper layer (L1) that splits into L2 and L3 (where L2 and L3 are trained on different types of data). The output of both L2 and L3 is combined to L4 for decoding.
I have currently written some code that acts as a hard switch for both inputs and labels so that the data can be filtered to either L2 or L3 accordingly. However, I’m concerned about the concatenation of the inputs prior to L4, currently I am thinking of just append L2 output to L3 output. However, this would mean that the original order of the mini batch is not preserved. Is this an issue for decoding? If so, would you have any suggestions on how to preserve order through the specialty networks? Maybe zero-padding?
Both training and testing follow this approach: L1=>(L2|| L3)=>L4
Loss is computed at L4.
Also, the network is composed of fully connected layers (i.e. arch_seq_model=False).
Got it, I guess my concern is that since I am concatenating L2 and L3 by just appending one after the other, I'm worried that the frames within a given sentence will not be ordered (at least in the same way as it was at the beginning of the batch). So even if the correct posterior probabilities are computed per frame, the sequence of frames for a given sentence will not be the same as it was if I preserved the batch order