A finite difference-ish approach to policy gradients. It's like PGET, but exploring in parameter space instead of action space.
Because, why search action space and then perform gradient descent -- which requires an expensive gradient tape/graph -- when you can just search in parameter space instead?
(because it's easier to search in action space than it is to search in parameter space, but it's a method worth exploring regardless)