Question on processing the expert data

Hi, I have a question on the code here:

EfficientImitate/algo/state_ei/replay_buffer.py

Line 69 in a782465

    
           game_history.action_history = [traj['act'][0]] + traj['act']  # we need a padding at the front.

Why do we need this padding? Wouldn't it mess up the order when we select the subset from the expert data?

EfficientImitate/algo/state_ei/replay_buffer.py

Line 243 in a782465

def sample_n_expert_games(self, n_games):

By the way, I am running cheetah_state.sh.

Could you explain it? Thanks a lot.

We followed some conventions used in other open MuZero repositories which pad the action history at the beginning. This makes loss computation in trainer slightly cleaner. We apologize for any confusion. Thanks!