Opened this issue 5 years ago · 2 comments
What about REINFORCE algorithm?
I'll work on it after Prioritized Experience Replay! Will probably be a couple weeks since I'm taking my time re-reading the PER paper and figuring out the most flexible implementation.
Thank you!