batch sizes
dlnp2 opened this issue · 2 comments
Thank you very much for your great repo! This is really helpful.
I have a question: in your parer, especially in Section "A.3 Training Details", you described
specific batch sizes for each model at each protein length are available in our repository
however I could not find these. Could tell me the corresponding information in this repo?
Thank you.
The pytorch version of the code removes this training detail. In tensorflow, we used that bucket_by_sequence_length
function to select batch sizes for each protein length and model. This is because 1) tensorflow provides this function easily and 2) tensorflow makes it more difficult to implement gradient accumulation.
The pytorch code instead implements gradient accumulation, which allows you to control the batch size in a different way. This is simpler to implement in pytorch and also results in a more easily reproducible setup. If you do want the batch sizes for the tensorflow code, you can look at each individual model, which have a get_optimal_batch_sizes()
function. If you're planning on using the pytorch code, simply pick a desired final batch size, distribute across as many GPUs that you plan to use, and then increase gradient accumulation steps until you're able to fit the batch in memory.