Denys88/rl_games

frames comparison within max epoch if

ankurhanda opened this issue · 9 comments

The current a2c_common.py has self.frame >= self.max_frames in the same if with self.epochs >= self.max_epochs

if epoch_num >= self.max_epochs or self.frame >= self.max_frames:
if self.game_rewards.current_size == 0:
print('WARNING: Max epochs reached before any env terminated at least once')
mean_rewards = -np.inf
self.save(os.path.join(self.nn_dir, 'last_' + self.config['name'] + 'ep' + str(epoch_num) + 'rew' + str(mean_rewards)))
print('MAX EPOCHS NUM!')

It used to be like this https://github.com/ArthurAllshire/rl_games/blob/master/rl_games/common/a2c_common.py#L1243-L1248

The other issue is that when the if condition returns True it always prints MAX EPOCHS NUM! which is not an accurate description because I was running my code and it would print this even when epochs hadn't reached their max. Need to have a separate if condition saying MAX FRAMES NUM!. Though I think what we had earlier is enough and we don't need to add another check on frames.

Fixed in #213

@ankurhanda can you confirm the issue is fixed for you?

This is still a problem because max_frames are set in default in the code so if epochs have not reached max_epochs it will still enter the 2nd if condition and end the training.

https://github.com/Denys88/rl_games/blob/master/rl_games/common/a2c_common.py#L1014-L1032

@ankurhanda can we just set it to something very large by default, like 10 trillion?
It is useful to have a stop condition by num frames (i.e. for paper plots where it is customary to report training to 100M frames or smth like that). But it should not interfere with experiments where it's not needed.

@ankurhanda in your example, it won't enter the 2nd condition as the default max_frames value is -1.

@alex-petrenko there were many issues with max_frames code. Linear learning rate wasn't updated properly when max_frames, but not max_epochs were set. Misleading message when max_frame was reached, it notified instead that max_epochs were reached and a few cosmetic ones.

Thank you for fixing these. I was not aware an extra stopping condition can affect learning rate.

Extra stopping condition is not affecting. But the linear scheduler itself instead of current_epoch and max_epoch should be using current_frame and max_frames to calculate the current learning rate if we have max_frames as a stopping condition.