Ghadjeres/DeepBach

Datasets in dataset_cache

Closed this issue · 3 comments

We noticed that only the dataset in dataset_cache/tensor_datasets/ is required to train the model and generate new chorales. However, the provided dataset in tensor_datasets/ in the zip file is named ChoraleDataset([0],bach_chorales,['fermata', 'tick', 'key'],8,4), indicating that it only contains the soprano voice.

If this dataset is used for training, should it not contain all four voices? Otherwise if it is used to fix the soprano part at generation time, it seems from our manual observation of the generated chorales that all notes are being sampled, and the soprano part are not real Lutheran melodies.

Also, what is the difference in purpose between the datasets in the datasets/ and tensor_datasets/ folder?

Thank you so much!

Hi,

Yes you're right. It seems that I put the wrong ChoraleDataset in tensor_dataset.... Sorry for that. So you'll have to recreate it if you want to train a new model (This takes some time because of all the transpositions together with the key analyzer of music21).
The difference between datasets/ and tensor_datasets/ is that datasets/ contains metadata about the dataset (size of the sequences, voices used, etc.) and tensor_dataset/ is only the tensor of size (num_examples, num_voices, chorale_length) + the metadata tensor (containing fermata indications). The idea was that when you want to generate, you don't necessarily need to load the whole dataset (which takes time and space) so you can only use information present in datasets/.

Maybe you can find the correct ChoraleDataset in the docker, but I'm not sure.

Best,

No worries! I recreated the dataset, which was very straight-forward to do with the provided code. :) By the way, there were a bunch of KeyError with chorale 309 printed, but I assume this is some unimportant problem with the key analyzer, as a comment in your code indicates. Do you remember seeing something similar? Training a model on this dataset and then generating chorales still produced good results, so I assume the dataset was constructed fine.

Oh, the reason for saving two different datasets in datasets/ and tensor_datasets/ makes complete sense. Thank you so much for explaining that!

Yes, I have exactly the same error with one of the chorales because of the key analyzer. This particular chorale will be just skipped and won't appear in the dataset. So no worries!
Best