question about step setting

Question

Closed this issue 2 months ago · 2 comments

您好请教一个问题 dp3.yaml中设置的
horizon: 4
n_obs_steps: 2
n_action_steps: 4
实际获取的每个样本为连续的四帧；
怎么做到 2步观测+4步推理的（正常我们不应该取连续的6帧吗）

Answer 1 · 2024-11-09T19:33:37.000Z

it is like this:
| o | o |
| a | a | a | a |
we actually use the last 3 actions.

(let me update the config maybe hhh, it makes people confused)

Answer 2 · 2024-11-11T04:12:43.000Z

那为啥网络预测的action 依然是(batch, 4, action_dim) 并且我看loss mask也没有对第一个action进行掩码