Parameters
a14en9 opened this issue · 9 comments
Thank you for posting the source code. I have sorted out the data (medium density) as described in the repo and attempted to train the model. To better understand how you are processing your data, may I confirm a few hyperparameters with you?
- The number of input points is set to 4096.
- The
sample_rate
needs to be set to0.01
according to the statement in the paper of "First, we partition the input data into columns with a base of 10m × 10m and infinite height. " I got the following message if I set thesample_rate
to0.01
intrainDataLoader
andtestDataLoader
log_string('eval mean loss: %f' % (loss_sum / float(num_batches)))
ZeroDivisionError: float division by zero
- Shall I change the code below (contained in
S3DISDataLoader.py
)
if split=='train':
ALL_FILES=ALL_FILES[:11]
to
if split=='train':
ALL_FILES=ALL_FILES[:10]
for the cross-city validation?
Thank you in advance and look forward to hearing from you.
Hello @cvhabitat
Allow me to answer your questions:
1- The number of input points is set to 4096 per block. So if you have multiple blocks per scene than the total number of points will be 4096*num_blocks. To summarize, an input pointcloud is split into multiple local regions and the model is applied to each region separately.
2- Since the block_size
is set to 1m × 1m and since our data is already normalized than one block is enough to contain all the input points so no need to modify the block_size
or the sample_rate
unless you work on another dataset.
3- In my case, for Swiss3DCities benchmark, each city contains 5 pointcloud scans so the line
ALL_FILES=ALL_FILES[:11]
Allows us to use the first 10 scans (from 2 cities) during training and the 5 last during testing. If you would like to use cross validation than you should select it by index. For example:
ALL_FILES=ALL_FILES[start_index:stop_index:step]
Thank you for your prompt response!
If I understand correctly, I don't need to change any hyperparameters of the given scripts to replicate the reported results, except for the load path. It's convenient for me a lot to follow the paper.
Btw, I suppose that it will load 11 tiles if use
ALL_FILES=ALL_FILES[:11]
Thank you!
You are right. Sorry for not mentioning this. I am loading 11 tiles since I added a rotated scan from Davos city in the training set to study its effect on rotation invariance. So you should adjust it to the number of scans that you have.
Thanks
Thanks for your information. Just one more question about the training and test time (no more than 6 hours as reported in the paper). I tried to run the script with 3 NVIDIA 3090 GPUs in a parallel manner, but it took a fair long time for every epoch. The batch size I set is 144. Do you have any idea why it takes a longer time than using a single NVIDIA 2080Ti? I'm a bit worried about if any particular places needed to be changed.
Did you track the gpu usage as well as the gpu usage percentage? First you need to make sure to set the ideal batch size for your available hardware. If gpu usage percentage is low than you need to increase the batch size which will allow a faster computing. Also make sure to set the right num_workers in the dataloader which denotes the number of processes that generate batches in parallel.
Yes, I monitored the GPU usage and found that their usage is always hopping, sometimes each GPU can exceed 80% usage, but most of the time two of them are 0 and the other is less than 10 %. When I tried to further increase the batch size, that is 156, an overflow error occurs. The num_workers is set to 8 in my case.
Such a problem has not been raised when I use multiple GPUs in similar models.
Did you track the gpu usage as well as the gpu usage percentage? First you need to make sure to set the ideal batch size for your available hardware. If gpu usage percentage is low than you need to increase the batch size which will allow a faster computing. Also make sure to set the right num_workers in the dataloader which denotes the number of processes that generate batches in parallel.
The printed messages are as follows:
`
Namespace(batch_size=144, data_path='/data/Swiss3DCities/sampling/', decay_rate=0.0001, epoch=200, learning_rate=0.01, log_dir='semseg', lr_decay=0.7, model='pointnet2_sem_seg_msg', no_cuda=False, npoint=4096, optimizer='Adam', seed=1, step_size=10, test_area=5)
/data/Swiss3DCities/sampling/
start loading training data ...
[1.22482551 3.56558644 1. 1.40946684 3.28217517]
Totally 35131 samples in train set.
start loading test data ...
[1.04042507 3.26419864 1.06538238 1. 3.52629829]
Totally 16524 samples in test set.
The number of training data is: 35131
The number of test data is: 16524
No existing model, starting training from scratch...
Let's use 3 GPUs!
`
Did you try setting all 3 gpus into the visible devices (by index)? Also did you transform your model into dataparallel for multiprocessing?
Yes, I tried both and they performed quite similar. The loaded model was first passed to the cuda and then called nn.DataParallel.