Using another batch size in training
asusdisciple opened this issue · 2 comments
I encountered a strange bug or rather a strange behaviour, which I can not really pinpoint to the exact issue.
I used the standard training, as you described and it worked fine. However when I changed the batch_size
parameter to 12 in config_v1_wavlm.json
the train.py was only executed until line 136 for i, batch in pb:
. Its not an memory issue as I still have more than 12GB free on my GPU but it seems for some reason the script skips the for loop if you increase the batch size in the json file.
Hmm, I am not too sure why that might be the case. If you initiate a keyboard interrupt while it is frozen, what is the stacktrace for where it is getting stuck? I don't see anything that would cause such a strange freezing behaviour. Does it only happen with batch size of 12, or any batch size other than 16?
Somehow it works now, seemed to be something temporary with the repo.