Problems encountered during training and evaluation
DonlynLee opened this issue · 3 comments
Sorry to interrupt you again. When evaluating the performance on BEHAVE with provided model, I found that the gif (correction) seems like the same as the gif without correction, and sometimes the evaluation metrics went into NaN. I also tried to train interaction diffusion on the BEHAVE, after 6 or 7 epochs, the problem is "NaN or Inf found in input tensor". I lowered the learning rate, it was helpful that more epochs are done, but then problem comes again. Is it due to insufficient GPU performance ? Or maybe due to the different sampled point clouds when generating contact labels? Thanks very much.
Hi,
You're welcome. If any questions arise, please don't hesitate to reach out:)
I didn't find any NaN during my evaluation. I will quickly check the training procedure.
The correction will not be executed for every interaction sequence. While these corrections may seem minor in the context of short-term predictions, it will be significant in the long-term generation. I will make this part of code available after cvpr ddl.
Also, I will upload the point clouds from my side shortly. Previous I was just concerned about the large file size.
Best
Really thanks for your reply, I will try again. Wish everything goes well !