ZikangZhou/HiVT

Question About Decoder Modeling

SwagJ opened this issue · 2 comments

SwagJ commented

Hi, @ZikangZhou

Thank you for sharing your great work. I just have a question regarding to the decoder modeling. In your paper, you model it as Laplacian Mixture Model instead of Gaussian. However, it seems that GMM is somehow a more common choose. Is there any specific reason for choosing LMM over GMM? Is that for training stability?
I am looking forward to your reply. Thank you in advance.

Best,

Hi @SwagJ,

My very early experiments demonstrated that L1 loss is better than L2 loss, and I guess this is because trajectory data are pretty noisy (especially for Argoverse 1) and L1 loss is more robust to outliers. You can also notice that people in this area usually use L1/smooth L1 loss instead of L2 loss.

Note that L1 loss implicitly assumes that the target belongs to a Laplace distribution with fixed scale, and L2 loss assumes a Gaussian distribution with fixed variance. In other words, LaplaceNLLLoss and GaussianNLLLoss can be viewed as the generalization of L1 loss and L2 loss, respectively. According to my observation that L1 loss is better than L2 loss, I choose to use LMM instead of GMM. However, I actually haven't tried using GMM, so I cannot make a clear conclusion on which one is better. Moreover, the conclusion may be different if you use different training recipes (e.g., learning rates, training epochs, lr schedulers, etc.). All I can say is that LMM is suitable under my training recipe.

Best,
Zikang

SwagJ commented

Hi @ZikangZhou,

Thank you very much for your insight. That makes total sense. I will try GMM models to see how it works. Thank you again.

Best