nico-bohlinger/one_policy_to_run_them_all

Question about the command

Closed this issue · 3 comments

Congratulations on this excellent work!

Are the command "c_x", "c_y" and "c_yaw" randomly sampled in each episode? Do they change in each episode? And is there a "max truncation timestep" (e.g. 1000) in your settings?

Besides, is there a typo in Table 2. (T13) in the Appendix? There seems to be an extra negative sign here.

Thank you very much.

Hey!

Yes, new command velocities are sampled with probability 0.002 every step, i.e. on average every 10 seconds.
An episode lasts for 20 seconds, which means at 50Hz action frequency this is 1000 environment steps.

You are correct, the minus in front of the sum is wrong. Thank you for noticing it!

Hey!

Yes, new command velocities are sampled with probability 0.002 every step, i.e. on average every 10 seconds. An episode lasts for 20 seconds, which means at 50Hz action frequency this is 1000 environment steps.

You are correct, the minus in front of the sum is wrong. Thank you for noticing it!

Thank you very much! If my understanding is correct, will the new command velocities change (but at most once) during one episode? And will the command counter (10 sec, as you mentioned) reset when an episode is terminated or truncated?

There is no counter for the command velocity change. At every step a boolean for changing the command velocity is sampled, like (pseudo code):
should_change_command = random.float(0, 1) < 0.002
if should_change_command: command_velocity = sample_command_velocity()

This means potentially the command velocity could even change 1000 times per episode, i.e. in every step. But with the probability of 0.002 it happens on average every 500 steps, so on average every 10 seconds. This still means it can happen more or less often and as I said there is no counter so it's also not affected by episode resets.