huangjh-pub/synorim

Retraining killed

Opened this issue · 5 comments

Thank you for the code!!

The results for your example datasets look sooooo beautiful!!!

I tried to retrain you network to do more test. However, when I trained the network, the first step was always killed. (At first it dead at 69%)

====================================== <<<<
Training: 87%|███████████████████████████████████████████████████████████████████████████████▊ | 386222/445500 [30:20:46<138:43:55, 8.43s/it, Loss = 0.16]Killed

I don't know why.

My system is Ubuntu 20.04.5 LTS; Processor: Intel® Xeon(R) Silver 4112 CPU @ 2.60GHz × 8 ; Graphics: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]

I have set the code environment as you mentioned in README.

Could you help me? Thanks!!

Thanks for your interest in our work! I suspect this is a CPU memory leak problem. What is your memory please?

@heiwang1997 Thank you for your reply!
Here's the PC details:

Screenshot from 2022-10-14 14-14-29

That should be enough to train our model.
Are you training using your own Dataloader? Can you please verify that the program gets killed when it eats up all system memory?

@heiwang1997 I tried to train on you datasets. All of them are OK. I guess maybe because the data I generated cannot be learned. Could I ask a question? What are the requirements (e.g. overlap ratio) of the training data?

wmrenr commented

@heiwang1997 I tried to train on you datasets. All of them are OK. I guess maybe because the data I generated cannot be learned. Could I ask a question? What are the requirements (e.g. overlap ratio) of the training data?

I am using my own point cloud datas to reproduce the code. May I ask how you can convert your own data into the data needed for training? Currently, my cloud datas are in ply format.