Does mini batch order matter?

Question

Does mini batch order matter?

matthewkperez opened this issue 5 years ago · 4 comments

Hello Pytorch-kaldi team,
I currently trying to study the effect of specialty networks (i.e. layers of the network only trained on specific data) on senone classification. To do this I want to implement a multitask network that has a shared upper layer (L1) that splits into L2 and L3 (where L2 and L3 are trained on different types of data). The output of both L2 and L3 is combined to L4 for decoding.

I have currently written some code that acts as a hard switch for both inputs and labels so that the data can be filtered to either L2 or L3 accordingly. However, I’m concerned about the concatenation of the inputs prior to L4, currently I am thinking of just append L2 output to L3 output. However, this would mean that the original order of the mini batch is not preserved. Is this an issue for decoding? If so, would you have any suggestions on how to preserve order through the specialty networks? Maybe zero-padding?

Answer 1 · 2020-03-25T18:34:36.000Z

Hi, as far as I understand during training you might have (L1=>L2 or L1=>L3), while during test (you have L1=>(L2|| L3)=>L4). Is that true?

…

On Wed, 25 Mar 2020 at 14:06, mkperez808 ***@***.***> wrote: Hello Pytorch-kaldi team, I currently trying to study the effect of specialty networks (i.e. layers of the network only trained on specific data) on senone classification. To do this I want to implement a multitask network that has a shared upper layer (L1) that splits into L2 and L3 (where L2 and L3 are trained on different types of data). The output of both L2 and L3 is combined to L4 for decoding. I have currently written some code that acts as a hard switch for both inputs and labels so that the data can be filtered to either L2 or L3 accordingly. However, I’m concerned about the concatenation of the inputs prior to L4, currently I am thinking of just append L2 output to L3 output. However, this would mean that the original order of the mini batch is not preserved. Is this an issue for decoding? If so, would you have any suggestions on how to preserve order through the specialty networks? Maybe zero-padding? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#219>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVTP5Q6WP2OUII6QIB3RJJB4DANCNFSM4LTUXBXA> .

Answer 2 · 2020-03-25T18:58:19.000Z

Both training and testing follow this approach: L1=>(L2|| L3)=>L4
Loss is computed at L4.

Also, the network is composed of fully connected layers (i.e. arch_seq_model=False).

Answer 3 · 2020-03-25T19:06:00.000Z

Well, for decoding it shouldn't be an issue. We indeed create an ark file that contains sentences_id + posterior probabilities. As long as these two pieces of information are matching, you should be able to decode correctly.

…

On Wed, 25 Mar 2020 at 14:58, mkperez808 ***@***.***> wrote: Both training and testing follow this approach: L1=>(L2|| L3)=>L4 Loss is computed at L4. Also, the network is composed of fully connected layers (i.e. arch_seq_model=False). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#219 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVWX7JFRFQVRZLJWACLRJJH5XANCNFSM4LTUXBXA> .

Answer 4 · 2020-03-25T19:14:31.000Z

Got it, I guess my concern is that since I am concatenating L2 and L3 by just appending one after the other, I'm worried that the frames within a given sentence will not be ordered (at least in the same way as it was at the beginning of the batch). So even if the correct posterior probabilities are computed per frame, the sequence of frames for a given sentence will not be the same as it was if I preserved the batch order