Question regarding training

Question

Question regarding training

Mirorrn opened this issue 4 years ago · 4 comments

Mirorrn commented 4 years ago

Hello,

first of all, thank you for the publicly available code! I have 2 understanding questions regarding the training:

as far as I understand you determine the positive and negative samples based on the ground truth at time t, with the horizon of e.g. 5 time steps. What about the possible collisions before and after the one point?
What is the ground truth in reinforcement learning? Do you use here a linear model?

Best regards

Answer 1 · 2021-03-09T09:52:28.000Z

Hello,

Thanks for your interest and questions.

The samples are defined as spatial-temporal Events. This allows us to take into account the future events at not only a particular time step but multiple ones simultaneously, e.g., from 2nd to 5th. The length of the horizon is a hyper-parameter. Empirically, we found that the horizon does not need to be too long to fulfill the goal of reducing collisions and boosting rewards.
In RL, we use the robot's experience in the successful trials as ground truth and skip those failure cases for contrastive learning. We will release the RL code very soon. Please stay tuned.

Answer 2 · 2021-03-09T11:16:08.000Z

Hello,

many thanks for the quick reply. To question 1) Wouldn't the sum over \delta t of the logits then have to be formed in Eq.4?

Answer 3 · 2021-03-10T12:16:40.000Z

Yes, that's a good point. :) In the current Arxiv version, we discussed the choice of δt in the paragraph before Eq 4 and in Table 4, but did not incorporate it in Eq 4. We will make it clearer in the updated version. Thanks!

Answer 4 · 2021-03-11T10:42:02.000Z

I'll close it for now, but please feel free to re-open it if you have additional questions