Question regarding training
Mirorrn opened this issue · 4 comments
Mirorrn commented
Hello,
first of all, thank you for the publicly available code! I have 2 understanding questions regarding the training:
- as far as I understand you determine the positive and negative samples based on the ground truth at time t, with the horizon of e.g. 5 time steps. What about the possible collisions before and after the one point?
- What is the ground truth in reinforcement learning? Do you use here a linear model?
Best regards
YuejiangLIU commented
Hello,
Thanks for your interest and questions.
- The samples are defined as spatial-temporal Events. This allows us to take into account the future events at not only a particular time step but multiple ones simultaneously, e.g., from 2nd to 5th. The length of the horizon is a hyper-parameter. Empirically, we found that the horizon does not need to be too long to fulfill the goal of reducing collisions and boosting rewards.
- In RL, we use the robot's experience in the successful trials as ground truth and skip those failure cases for contrastive learning. We will release the RL code very soon. Please stay tuned.
Mirorrn commented
Hello,
many thanks for the quick reply. To question 1) Wouldn't the sum over \delta t of the logits then have to be formed in Eq.4?
YuejiangLIU commented
Yes, that's a good point. :) In the current Arxiv version, we discussed the choice of δt in the paragraph before Eq 4 and in Table 4, but did not incorporate it in Eq 4. We will make it clearer in the updated version. Thanks!
YuejiangLIU commented
I'll close it for now, but please feel free to re-open it if you have additional questions