How to increase batch_size?

Question

How to increase batch_size?

Ruye-aa opened this issue 3 years ago · 1 comments

Thank you for your outstanding work, I have some questions when running the code.

When I successfully run the program, I found that an epoch takes a long time to run. I tried to increase the batchsize，and change the batchsize from 1 to 2. But I encountered a bug here：
File "/home/aiyang/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) AssertionError: Caught AssertionError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/aiyang/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/aiyang/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/home/aiyang/Coarse-to-fine-correspondences/model/KPConv/preprocessing.py", line 72, in collate_fn_descriptor assert len(list_data) == 1 AssertionError

I wanna to know if there is any way to solve this problem, or if you have any way to increase the training speed during training. Thanks

Answer 1 · 2021-12-16T10:46:04.000Z

Hi, there,

Thanks for your interest in our work! For the batch size, original KPConv implementation supports batch_size > 1. However, as we use attention modules on the bottleneck and the number of patches there is not the same for different frame pairs, implementing this part in a batch > 1 could be difficult. To improve the time during training, I would give the listed suggestions:

Use the same number of patches for different frame pairs, such that you can use a batch_size > 1;
Use more GPUs, e.g., 4 GPUs each with a batch size of 1;
The calculation of ground truth overlap ratio between patches can be further optimized;
As the data processing in KPConv heavily relies on CPU, if you train the model on a server where CPU is sliced, this part would be a bottleneck. You can try this: https://github.com/qinzheng93/Easy-KPConv, which moves the CPU-based operations onto GPUs.

Best,

Hao