Not handling time limits
carlos-UPC-AI opened this issue · 0 comments
carlos-UPC-AI commented
In the DQNAgent, particularly in the step method, there seems to be a potential issue in properly distinguishing between termination and truncation, as suggested by the Gymnasium documentation available at https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/.
The following line of code, done = terminated or truncated
, treats both termination and truncation equally.
Furthermore, in the _compute_dqn_loss method, the code lines:
mask = 1 - done target = (reward + self.gamma * next_q_value * mask).to(self.device)
do not seem to account specifically for truncation.