Training details
os1a opened this issue · 6 comments
Hi,
According to the paper section 4.1 (implementation details), you use a batch size of 128 and train for 36 epochs with a learning rate 0.001 and decayed at 32 to 0.0001.
According to the provided code, the batch size is 32:
Line 50 in 7e9b51d
Does it give the same performance?
Also one more question about the loss function, can you give more insights for the classification loss? why do you need it, and have you tried training without it?
Thanks a lot for the great work.
-
For the batch-size, since we use horovod (distributed training) with 4 GPUs, so the batch-size is 32*4. I remember trained in single gpu with batch-size 32, the performance was a bit down, but in a very small margin.
-
For the classification branch it's used to rank predicted trajectories, e.g., when calculating ADE1, we chose the trajectory with the highest score. Besides, we use max-margin loss to encourage multi-modal prediction.
Thanks a lot for your answers.
Could you please elaborate more why the max-margin loss will encourage multi-modal prediction? I could not find more details about that in the paper.
If we use binary cross-entropy loss, the trajectory is far away from the ground truth would be considered as negative and suppressed to be of zero likelihood. But for the max-margin loss, we only ask it to have a score at least ϵ smaller than the most-close trajectory.
But we don't have the ablation of BCELoss VS MaxMaginLoss, this is a design choice.
you could also refers to section 3.3 of this motion planing paper for more info about maxmargin loss
Thanks a lot for your explanation.
So the regression loss you are using is the WTA (winner takes all) loss which penalizes only the best hypothesis. I am wondering if you have tried training your approach with only the regression loss which already ensures diversity among the hypotheses?
Sorry, I missed you last issues. Hope it's not too late.
So the regression loss you are using is the WTA (winner takes all) loss which penalizes only the best hypothesis
Right
I am wondering if you have tried training your approach with only the regression loss which already ensures diversity among the hypotheses?
Sorry, we didn't tried that
I'll close it for now. Feel free to reopen it if you still have questions.