Hyper-parameter setting for fine-tuning on contact prediction task

Question

Hyper-parameter setting for fine-tuning on contact prediction task

wuzhen247 opened this issue 4 years ago · 2 comments

Great work! I encountered "CUDA out of memory" issue when fine-tuning on contact prediction task. Could you provide your settings of contact prediction used in the paper? such as GPU memory, batch size, max length, etc.

Thanks.

Answer 1 · 2020-12-18T21:13:29.000Z

So this is a bit different because tensorflow supports certain conventions (in particular dynamic batch sizes) that pytorch does not easily support. For the paper, we computed dynamic batch sizes based on the number of total tokens the model was able to handle (where total tokens = batch size * sequence length).

Here, we're not able to do that and in fact I haven't extensively trained the contact prediction task in pytorch. For saving memory, I'd recommend setting a very small batch size (maybe even 1) and using gradient accumulation to simulate a larger batch size. You may also want to downproject before the pairwise function - I think I didn't do so in the current implementation on TAPE, but you can downproject to 64 or 128 before making the 1D features 2D, which should save a lot of memory.

If you want to replicate the exact experiment we did in the paper, there's no easy way to do that in pytorch, so you should use the original tensorflow code.

Answer 2 · 2020-12-20T04:23:28.000Z

Yes, I tried setting batch size=1 but still failed with 16G GPU memory. Anyway, thanks for your response. That does help.