sentenai/reinforce

Add some policy gradient methods to the algorithms

stites opened this issue · 0 comments

  • Actor-critic methods
  • REINFORCE