ika-rwth-aachen/Cam2BEV

Training failed

shangweichao opened this issue · 9 comments

Hello! When I was training your data set, I failed every time to the last step, I changed the batch-size to 2, also failed to the last step, what is your training time? Can you help me out here. Thank you!

What exactly do you mean by "failed to the last step"? The progress bar shows completed batches per epoch. Once it has completed, validation will run (without an additional progress bar), which will also take a while. Then, the next epoch with a new progress bar will be started. Please post the exact error message!

Depending on the type of model, our training time was ~20-45min/epoch on an RTX 2080 TI.

You say you have attached screenshots, but I don't seem to see any attachments.

The number of batches (not images) is 6639 with batch size 5, since there are 33199 training images. If you reduce the batch size to 2 you obviously get 16599 batches.

The fact that the progress bar only reaches 6638/6639 is Keras-related. Under the hood, training phase has already finished and validation is running. As to why your system is running out-of-memory during validation, I cannot help you without further information. The RTX 2080 TI we used has 12GB of GPU memory.

I see no need to upload a trained model. All algorithms and data to reproduce our model is available here. I am now closing this issue, thank you for your interest in the project!