One easy (maybe?) tweak to the novelty reward function

Question

One easy (maybe?) tweak to the novelty reward function

apple-ihack-geek opened this issue a year ago · 3 comments

What if instead of or in addition to the new screen novelty/exploration reward, you used the AI's coordinates in the game? Since you went through all the work of stitching the map into a global coordinate system for the video, I think it would also make sense to use that as part of its training and reward data. That would easily fix the current problem where the AI get's stuck in a long uniform tunnel or path.
I would dig into it myself, but I'm about to go off grid for the weekend.

Answer 1 · 2023-10-27T20:33:15.000Z

This has been implemented, and can be used to get past mt moon! you can enable it by setting use_screen_explore: False in the config.

Answer 2 · 2023-10-30T03:19:58.000Z

Based on the timing, I'm guessing you had already been working on this or already completed it. Either way, glad to know I
was thinking along the same lines. Maybe you don't want to spoil it and keep it hidden for a follow up video, but how far does it get? what does it hang on now?

Answer 3 · 2023-10-30T11:42:11.000Z

This has been implemented, and can be used to get past mt moon! you can enable it by setting use_screen_explore: False in the config.
Do I need to stop my current session and start a new one to implement this?