m-abr/FCPCodebase

Get_Up Sample Code Questions

Closed this issue · 1 comments

In Get_Up.py, there is the following code
截图 2024-04-27 17-51-07
In line 198 "obs = np.zeros(6)", why should the parameter here be set to 6?

It's a bug. The '6' is a hardcoded length that worked for the Get Up example because each get up behavior has 6 time slots, but it makes no sense to hardcode that information. For context, the observations follow a one-hot encoding scheme. Here are the expected observations for each time step, considering a behavior with 6 time slots:

First, the Reset function is called:

  • t=0, obs=[1,0,0,0,0,0]

Then, the Step function is called successively:

  • t=1, action 1 is stored, obs=[0,1,0,0,0,0]
  • t=2, action 2 is stored, obs=[0,0,1,0,0,0]
  • t=3, action 3 is stored, obs=[0,0,0,1,0,0]
  • t=4, action 4 is stored, obs=[0,0,0,0,1,0]
  • t=5, action 5 is stored, obs=[0,0,0,0,0,1]
  • t=6, action 6 is stored, obs=(the observation at this step is irrelevant)

For the terminal step (t=6), the returned observation is not used by the learning algorithm. In the current implementation, I returned [0,0,0,0,0,0] or obs=np.zeros(6) as a dummy value. However, to allow behaviors with an arbitrary number of slots, it could be replaced by obs = self.obs[0]. I will fix this soon.