Curt-Park/rainbow-is-all-you-need

Not handling time limits

carlos-UPC-AI opened this issue · 0 comments

In the DQNAgent, particularly in the step method, there seems to be a potential issue in properly distinguishing between termination and truncation, as suggested by the Gymnasium documentation available at https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/.

The following line of code, done = terminated or truncated, treats both termination and truncation equally.

Furthermore, in the _compute_dqn_loss method, the code lines:
mask = 1 - done target = (reward + self.gamma * next_q_value * mask).to(self.device)

do not seem to account specifically for truncation.