Get_Up Sample Code Questions

Question

Get_Up Sample Code Questions

Closed this issue 9 months ago · 1 comments

chance20210722 commented 9 months ago

In Get_Up.py, there is the following code

In line 198 "obs = np.zeros(6)", why should the parameter here be set to 6?

Answer 1 · 2024-04-28T12:20:35.000Z

It's a bug. The '6' is a hardcoded length that worked for the Get Up example because each get up behavior has 6 time slots, but it makes no sense to hardcode that information. For context, the observations follow a one-hot encoding scheme. Here are the expected observations for each time step, considering a behavior with 6 time slots:

First, the Reset function is called:

t=0, obs=[1,0,0,0,0,0]

Then, the Step function is called successively:

t=1, action 1 is stored, obs=[0,1,0,0,0,0]
t=2, action 2 is stored, obs=[0,0,1,0,0,0]
t=3, action 3 is stored, obs=[0,0,0,1,0,0]
t=4, action 4 is stored, obs=[0,0,0,0,1,0]
t=5, action 5 is stored, obs=[0,0,0,0,0,1]
t=6, action 6 is stored, obs=(the observation at this step is irrelevant)

For the terminal step (t=6), the returned observation is not used by the learning algorithm. In the current implementation, I returned [0,0,0,0,0,0] or obs=np.zeros(6) as a dummy value. However, to allow behaviors with an arbitrary number of slots, it could be replaced by obs = self.obs[0]. I will fix this soon.