MehmetAygun/4D-PLS

Testing model and time

Trainingzy opened this issue · 20 comments

Hi @MehmetAygun , thank you for your code and contribution to the community. It is a really interesting work.

I have run your training and testing code. It seems that the validation process takes quite a long time, maybe one day. For each frame, it will takes more than one minute, while in the training process, each iteration only takes no more than 2 seconds. So why is it? No point downsampling for validation or there is something wrong with my implementation?

Another question is about the trained model. It seems that you only share the pretrained model but not the well-trained final model, will you share it?

The testing time is longer than training but the difference shouldn't be that much. During testing it doesn't use multi-thread data-loader since the predictions previous frames are necessary, and there is another burden on: writing predictions to disk and reading from. Moreover, as you said there is no downsampling for validation. I would suggest you to check if the writing predictions to disk taking too much time or not. Normally it should take 5-6h for the whole validation set.

The model that I shared is the final model, you should be able to get a good performance with that model, but it also depends on the gpu memory size since the code adjust the downsampling size via memory capacity.

Thank you for your reply.

  1. I use one 1080Ti GPU for validation. Now the validation process is more than 1 day, but only 2600 frames are processed (total 4700+ I think). What GPU do you use?
  2. About the final model, do you mean the one you provide in the readme here? I also run the training file, it seems that the checkpoint is from epoch 222, but you train the model for a total of 1000 epochs, right?
  3. Another question is about training memory cost and speed, I train the model on one 3090Ti, taking up around 20GB. As for the training speed, as I said before, each iteration takes about 1.5s, that is, I will need more than one week to finish training. So how do you train the model, and how long?

Look forward to your reply.

1 - I used rtx_8000 for training and rtx_5000 for testing.
2- Yeah the file on the readme is the final model, probably it was a fine-tuned model that is the reason it is showing epoch 222. During pre-training (no instance loss) the important things is getting a good segmentation performance, then ~200 epoch with full losses is enough for good performance.
3- Explained in 2 how i trained. About timing, If I recall correctly it was 2 days for pretraining and 1 days with full training.

Thank you for your reply. It is really a great work that I want to follow. Hope you explain it to me.

How do you train the model? Pre-training for 200 epochs and fine-tuning for 22 epochs? You mention that you pre-train for 200 epochs and fine-tune for 800 epochs in #5. I wonder how many epochs you trained and the implementation details.

In your code, it seems that you let us train the model based on your shared model. So the parameter setting is for pre-training or fine-tuning?

I would suggest this schedule,

  1. Set previous previous_training_path as empty, config.pre_train as True, config.learning_rate = 1e-2, and train a model for about 200 epochs, but check the segmentation acc from validation.
  2. Then set the previous_training_path whatever it saved from previous trianing, set config.learning_rate = 1e-3, and config.pre_train as False and do fine-tuning about 800 epoch or do validation once a while during training with the saved models to stop training.
  3. After this done, you can use the model for testing.

The model that I shared is for testing not for training. It is fully trained. You can take the model that I shared and do validation/testing without any training.

Thank you for your reply. I will try it.

However, training for 1000 epochs total takes quite a long time. I wonder your shared model is trained for 222 epochs or 1000 epochs.

1000 epochs

Got it! Thank you!

After inferencing for two days, I evaluate the performance of the provided model on the SemanticKITTI validation dataset (sequence 8). The results are as follows:

LSTQ $S_{assoc}$ $S_{cls}$ IoU$^{st}$ IoU$^{th}$
57.84 62.15 53.84 59.50 52.78

The results are much lower than the performance on the paper.

Hey what kind of GPU (specially memory size) did you use ? It has a effect on the sampling size, which hurts the overall performance.

I use one 1080Ti GPU with 11GB memory.

It is interesting the memory influences the performance, maybe I will try the inference with one 3090 GPU (24 GB).

Yeah, since the input point cloud is big (~100k points), it had to randomly downsample points for fitting to memory, if the memory is big it doesn't need to drop a lot of points, and the precision become higher as the final performance.

Hi,@MehmetAygun . I am confused why you can finish training in 3 days. We use two v100 and it will take us more than one week( maybe 2 weeks ) to train 1000 epochs. How many GPU did you use during training and testing?
Looking forward to your reply~

Hey, maybe I mixed with the duration for the training time( ı dont have logs now to check it), but It definetly shouldnt be 2 weeks.

I used a single rtx 8000 for trainings, but dont know other hardware details since it was running on a cluster. You can debug where it is slow by looking the traning log, the training script printing times for each iteration(data loading, forward, backward time etc)

Thanks. I will check the log.

Hi @MehmetAygun
I have another question about the annotation labels. For the SemanticKitti dataset, you use ins_labels = frame_labels >> 16 to get the instance labels here.
Are the instance labels consistent for all the frames? In other words, for 4D panoptic segmentation, the instance label of one specific instance should be the same across all frames, right?

Yes, within each sequence the instance ids are consistent through time.

Thank you for your patience and reply.

Both your work and LIDAR-MOS deal with sequence frames. I am not familiar with 3D vision. So for such frames, the pose of each frame should be converted to the first frame of the sequence?
I think both your work and LIDAR-MOS convert the pose, but you did with different code/equation. So do these two codes equivalent?

Yeah, it seems both doing similar things. I would suggest you to check how rigid motion is modelled in computer vision if you want to understand it.

Thank you~~ I will look into it.