chrisdonahue/nesmdb

Loading batches

serkansulun opened this issue · 2 comments

Hi Chris. Congrats for the great work!

I noticed that when the length of a song is shorter than the target length, multiple songs are concatenated in a single sample. I'm aware that there are tokens, but don't you think that this is somehow problematic, since the attention module will process two very different songs in a single pass?

As a side question, is there a particular reason why you haven't used Pytorch's data loader?

Thanks in advance.

Hi Serkan. Are you referring to the data loader for LakhNES or something specific to NES-MDB (this repository)?

If the former, yeah this is definitely a bit of a curious decision. However, believe it or not, this is fairly standard practice for training large LMs, since you can get training signal out of an entire minibatch instead of using zero padding and throwing away some parallelism.

In theory, the model can learn to not attend to information before the tokens which separate items, since it should not assist the model with predicting anything it sees after that token. Whether or not this happens in practice is anyone's guess, but regardless it doesn't seem to hurt the model's inference-time behavior.

Thanks for your answer. I have another question regarding the same topic. LMShuffledIterator puts different samples (songs) in different batches, so how does this take advantage of Transformer-XL's memory? In this situation, it looks like the memory and the input sequence are created using different songs, am I wrong?