What's the GPU device used during your training and finetuing?

Question

What's the GPU device used during your training and finetuing?

Closed this issue 3 years ago · 1 comments

As the title described, I wonder the GPU device you used to support the batch_size=20.

I use a RTX 2080 Ti, which has 11GB memory, when running train_crosspoint.py, I have to set batch_size=2 to avoid CUDA out of memory since you konw, knn and torch.cat in models/dgcnn.py will consume a large portion of memory.

However, the small batch_size leads to much slower training procedure so that I can get the final results probably in 4 or 5 days.

By the way, I have multiple GPUs, is it possible to incorporate DistributedDataParallel to accelerate the training procedure?

Anyway, I will try it out!

Answer 1 · 2022-03-27T02:52:55.000Z

Hi, yes it should be possible to incorporate DistributedDataParallel. However, we didn't have access to multiple GPUs while training. So, we couldn't;t add that in the code.

I would love to have your contribution in the code base if you are using DitributedDataParallel mechanism. Feel free to make a pull request if you have successfully incorporated that.