Policy Search with Eligibility Traces

A finite difference-ish approach to policy gradients. It's like PGET, but exploring in parameter space instead of action space.

Why?

Because, why search action space and then perform gradient descent -- which requires an expensive gradient tape/graph -- when you can just search in parameter space instead?

(because it's easier to search in action space than it is to search in parameter space, but it's a method worth exploring regardless)

tehZevo/pset

Policy Search with Eligibility Traces

Why?