Using another batch size in training

Question

Using another batch size in training

asusdisciple opened this issue a year ago · 2 comments

I encountered a strange bug or rather a strange behaviour, which I can not really pinpoint to the exact issue.
I used the standard training, as you described and it worked fine. However when I changed the batch_size parameter to 12 in config_v1_wavlm.json the train.py was only executed until line 136 for i, batch in pb:. Its not an memory issue as I still have more than 12GB free on my GPU but it seems for some reason the script skips the for loop if you increase the batch size in the json file.

Answer 1 · 2023-11-08T08:55:35.000Z

Hmm, I am not too sure why that might be the case. If you initiate a keyboard interrupt while it is frozen, what is the stacktrace for where it is getting stuck? I don't see anything that would cause such a strange freezing behaviour. Does it only happen with batch size of 12, or any batch size other than 16?

Answer 2 · 2023-11-13T13:39:57.000Z

Somehow it works now, seemed to be something temporary with the repo.