zhaohengyin/EfficientImitate

Question on processing the expert data

Closed this issue · 1 comments

Hi, I have a question on the code here:

game_history.action_history = [traj['act'][0]] + traj['act'] # we need a padding at the front.

Why do we need this padding? Wouldn't it mess up the order when we select the subset from the expert data?

def sample_n_expert_games(self, n_games):
By the way, I am running cheetah_state.sh.

Could you explain it? Thanks a lot.

We followed some conventions used in other open MuZero repositories which pad the action history at the beginning. This makes loss computation in trainer slightly cleaner. We apologize for any confusion. Thanks!