Get_Up Sample Code Questions
Closed this issue · 1 comments
It's a bug. The '6' is a hardcoded length that worked for the Get Up example because each get up behavior has 6 time slots, but it makes no sense to hardcode that information. For context, the observations follow a one-hot encoding scheme. Here are the expected observations for each time step, considering a behavior with 6 time slots:
First, the Reset function is called:
- t=0, obs=[1,0,0,0,0,0]
Then, the Step function is called successively:
- t=1, action 1 is stored, obs=[0,1,0,0,0,0]
- t=2, action 2 is stored, obs=[0,0,1,0,0,0]
- t=3, action 3 is stored, obs=[0,0,0,1,0,0]
- t=4, action 4 is stored, obs=[0,0,0,0,1,0]
- t=5, action 5 is stored, obs=[0,0,0,0,0,1]
- t=6, action 6 is stored, obs=(the observation at this step is irrelevant)
For the terminal step (t=6), the returned observation is not used by the learning algorithm. In the current implementation, I returned [0,0,0,0,0,0] or obs=np.zeros(6)
as a dummy value. However, to allow behaviors with an arbitrary number of slots, it could be replaced by obs = self.obs[0]
. I will fix this soon.