kzl/decision-transformer

Questions about dataset preprocessing

Opened this issue · 0 comments

Hi,
I have some question about the data preprocessing of medium-replay datasets. In the provided implementation,
https://github.com/kzl/decision-transformer/blob/e2d82e68f330c00f763507b3b01d774740bee53f/gym/data/download_d4rl_datasets.py#L35...L40

whenever the final_timestep or done_bool is true, the collected data will be added as a trajectory. However in D4RL's docs,

Timeouts in this (medium-replay) dataset are not always marked when the agent reaches the max trajectory length, but rather when 1000 timesteps have been sampled for a particular training iteration.

Thus, there exist trajectories which are not done or timeout but rather truncated due to the limitation of sampling steps. Such trajectories are typically short in length, and if we compute return on these trajs, the return-to-go will be deviated from its true value since we don't give an estimated value for the last timestep. Will this be an issue for DT?

Please correct me if there is any mis-understanding =)