MzeroMiko/VMamba

The issue of cudnn affecting speed

lp-094 opened this issue · 11 comments

Hello, we tried to turn off cudnn, but there was no improvement in speed. Does it need to be in a specific version (such as V3) to be effective?
image

Do not worry, because it only happens to some machines (I do not get the pattern actually why a machine will be slow, maybe it is related to the driver or library).

using torch.backends.cudnn.enabled=True in downstream tasks may be quite slow. If you found vmamba quite slow in your machine, disable it in vmamba.py, else, ignore this.

image
I had a similar problem, even though I set it to False in vmamba.py, its training time was still unstable, I used pytorch==2.0.1, python=3.11, V100
image

Do not worry, because it only happens to some machines (I do not get the pattern actually why a machine will be slow, maybe it is related to the driver or library).

using torch.backends.cudnn.enabled=True in downstream tasks may be quite slow. If you found vmamba quite slow in your machine, disable it in vmamba.py, else, ignore this.

In fact, when I used an A100*8 card with batch size set to 512, it took me 2 hours to run an epoch。
image

That seems not possible. What environment you are using?

Also, for 8xV100, the time is about 10mins per epoch.

Also, for 8xV100, the time is about 10mins per epoch.

I re-executed the program, and the first round took a long time, but the subsequent training was normal. However, I set torch.backends.cudnn.enabled = True. Could you let me know if this has a big impact on the model's performance?

It is still weird somehow, and I do not know why the first round would be abnormal.
In my experiments, all epochs' time-consumption are similar, while all the first iter in each epoch is slow, as the program need to load the data from the very beginning.

Enable or disable cudnn may influence the performance, but I think the difference is tolerable.

Looking at the logs, it seems to be the data loading, which took a lot of time because we were putting the data on another server and making it available to each server through data sharing.

@MzeroMiko
image
The configuration file I used is vmambav2_tiny_224.yaml. I contrasted the current log with the author's log and found that the accuracy of EMA was much lower than expected. My EMA has an accuracy of 0.29%, but the author has an accuracy of 6.08% for the same epoch, my emaacc updates very slowly, what is the cause of this, I don't know much about how EMA works.

Oh, it is because the batch_size you use is much bigger than mine. EMA means that in every iter, the parameter is updated with the latest version. The batch_size is smaller, the more frequently the ema parameter is updated, and the higher performance it'll get in a certain epoch.

But I can not predict what'll happen in the last 50 epochs, as the training start to converge. You may get higher performance with this batchsize.

@MzeroMiko Thanks for your reply, it is now back to normal.
image